More Google Crawling Oddities

May 22nd, 2006 by Michael Gray in SEO


If you're new here, you may want to subscribe to my RSS feed. Read my top posts or learn more about Michael Gray. Want more frequent updates follow me on Twitter. Thanks for visiting!

I recently mentioned I was having some problems with light crawling and old cache dates, well I got to work cleaning up my pages and changing from example.com/foo.html to example.com/foo/ . I’ve done the right thing redirecting with htaccess and since the site is legit I entered it into Google sitemaps. Now shortly after entering it I noticed a pretty odd result:

error screen shot

After clicking through it listed some of the pages with the old cache dates, and it also listed them as 404 errors. Now I suppose it is possible that the Googlebot came by during the small interval when the URL’s were changed and the htaccess hadn’t been posted, but it was less than 30 minutes, so I’ll say possible but unlikely. Looking back at my other crawling problem there are still a couple of pages mismatched but on the whole things are looking better. However Adam Lasnik (let’s see if he ego searches) posts over in WMW about queries with bad results and link to a post over on Google Sitemaps.

First off way to go letting everyone know there’s a problem and you’re working on it. To all my fellow web publishers out there, unless you’re sitting on top of a big fat authority domain, I think you’re going to have a really hard time getting new pages or updated pages in the Google with the new indexing timeline. I still think we’re going to see “dropped pages”, “missing pages” and “gone supplemental” problems for a while. Don’t give up hope though I’d suspect in a few months or about 30 days before the next green pixie dust update we should start hearing little hints about Google crawling like crazy. What will be interesting is if people running Adsense get a bit of a boost with “fresher crawling” from the new crawl caching proxy.

So how bout it anybody else seeing any funny, weird or unusual crawling from Google?

Sphere It

Text Link Ads


7 Responses to “More Google Crawling Oddities”

  1. Lea Says:

    For the ‘caching’ it shouldn’t work that way (ie adsense publishers shouldn’t get a boost from mediabot). In case it isn’t obvious, a cache just means that when the main googlebot decides to crawl a given page, it doesn’t have to go all the way to the site if it is already cached; it can pull the page from the cache instead; it doesn’t affect how many or how often crawls take place.
    Google using a cache will actually make crawling look lighter, as there won’t be as many entries in the site’s logs, although it will still be happening.
    The main downside, from a publisher’s point of view, of Google using a cache is that it will take that little bit longer to get a fresh page into the index. because a previous version may be in the cache when an update to the page occurs, and thus the bot won’t see the change for a little longer. Depends how they’ve configured it, of course.

  2. Carcasherdotcom Seocontest » Blog Archive » Who plays “Jack” in the Jack in the Box commercials? Says:

    [...] More Google Crawling OdditiesI recently mentioned I was having some problems with light crawling and old cache dates, well I got to work cleaning up my pages and changing from example.com/foo.html to example.com/foo/ . I ve done the right thing redirecting with … [...]

  3. Administrator Says:

    My thinking is as follows, if a page would normally get flagged for “light crawling” via google, but gets a lot of traffic via Yahoo, MSN, Adbrite or any other means and has Adsense on the page will it “call” for a spider to come crawling. For blogs, forums or any other user generated content that could change quickly that could play a role.

  4. Adam Lasnik Says:

    You can’t call it ego surfing if it’s on behalf of one’s employer, right? ;-)

    I’d post a note in the Sitemaps Google Group. The Sitemaps folks dilligently read (and often respond to) messages there quite frequently. You’re also welcome to drop a note to bostonpubcon2006 [at] gmail.com, which my team and I are monitoring, but I’ll warn you that there’s a bit of a backlog there.

  5. Aaron Pratt Says:

    Hi, my name is AaronBot v2.0, I follow AdamBot v2.0 around on the web learning from him while MattBot v3.0 is out of town.

    Michael - Adam is correct, post in Google Groups or even contact people directly. Vanessa Fox is always eager to find out what might not be working correctly to pass on to her team, but I understand your post is a partial link bait, wish Google paid this much attention to the rest of us. Did you get the paypal for those aquarium books Michael? I’m excited about growing a reef. :)

    Google is smart to listen to webmasters, some of us are obsessed with perfection and we expect this from search engines as well.

    Cheers!

  6. » Dear Michael Gray SEO Buzz Box Says:

    [...] I noticed that you are on Matt Cutts and Adam v2.0’s RSS reader, I know this because when you question authority in your blog you often get a quick response. Could you please do a post on PR distribution in Google Sitemaps? It does not make any sense and there has not been an individual inside or outside of Google who can explain how to read it. [...]

  7. Aaron Pratt Says:

    http://wordpress-plugins.biggnuts.com/objection-redirection-wordpress-plugin/