Crawling Difficulties and Error Pages

October 21st, 2006 by Michael Gray in Google, SEO


If you're new here, you may want to subscribe to my RSS feed. Read my top posts or learn more about Michael Gray. Want more frequent updates follow me on Twitter. Thanks for visiting!

I like to put my error pages in a subdirectory usually named “error”. So any 401, 403, 404 or other errors will end up in http://www.example.com/error/404/ . I also use this for other thing like database failures or things so I end up with http://www.example.com/error/database-error/ with a hopefully human friendly explanation of what happened. In my robots.txt file I block the entire error directory for all crawlers. Google seems to be having a real issue with this structure as it will report the original page as blocked by robots.txt. So it looks like I may be shooting myself in the foot by trying to be more organized.

Sphere It

Text Link Ads


4 Responses to “Crawling Difficulties and Error Pages”

  1. Dsw Says:

    If google faces problem in that matter, then why not just delete the original file instead of moving to error directory. You will have no need to set robot.txt block.

  2. Michael Gray Says:

    I’m redirecting on the error condition. It’s content that’s pulling from a database so there’s no file to delete. The directory is actually a parameter that’s being handled via htaccess.

  3. Lea de Groot Says:

    I’ve found that using noindex metas is better for google than robots.txt
    They ‘don’t index’ without generating the annoying sitemaps errors.
    Not sure about the other engines.

  4. Michael Gray Says:

    I use both