Superficial Crawling SEO Strategies

June 5th, 2006 by Michael Gray in Google, Grayhat SEO, SEO


If you're new here, you may want to subscribe to my RSS feed. Read my top posts or learn more about Michael Gray. Want more frequent updates follow me on Twitter. Thanks for visiting!

On WebmasterWorld people are discussing big daddy strategies, and on SEORountable they are highlighting how this is becoming a problem for directories. No discussion about Google’s new crawling method would be complete without also looking at Matt Cutts on the indexing timeline. While to some extent things are still in flux, I think we’ve hit a turning point for SEO, that needs to be reckoned with.

For those of you who’ve been playing in the space for any length of time what we’re seeing and hearing may look familiar. In fact it’s following a very similar path for the alleged ‘thing that looks and acts like a sandbox but really isn’t a sandbox’. A brief history, some people were reporting difficulties with getting new content to rank in the last quarter in 2003. Lot’s of folks dismissed it saying things were just in flux and under adjustment, but in early 2004 the reports were so large it became hard to ignore, and the ’sandbox’ term was born. Now I know there are some who steadfastly believe there is no sandbox. What they mean to say is the sandbox doesn’t exist for them, either because they have long term marketing goals that don’t depend of free search engine traffic (Mike Grehan says) or they are creating unique and truly link-worthy content (When Unique Content Is Not “Unique” - Sugarrae). However if you ask anyone who plays in the highly competitive trenches, dabbles in the back-arts, has websites that are slightly gray hat, or is an outright button pushing spammer, they can confirm exactly what Danny Sullivan says yes there is a Google Sandbox (2005 Year End Revisit: Is There A Google Sandbox?)

However I’m not looking to stir up yet another pointless is there/is there not sandbox debate. What I am here to say is I think we are hitting the leading edge of a new force to be addressed with, which for lack of a better term I’m calling ’sandbox crawling’. Here’s the way I see it, if your website is missing the right ‘quality indicators’ what you’ll start to see is superficial crawling and indexing of your website. Your site which may have had hundreds, thousands or even hundreds of thousands of pages will just not be as well represented in Google’s index as you would like it to be. You will still be indexed in some way, your home page for certain, and probably most of your first level pages for sure, but second-level, third-level and fourth level pages will remain largely untapped. Now what if you like Andy firmly believe The Money’s In The Archives Stupid isn’t this light crawling going to have an effect on the way you monetize your website, you bet it will!

Let’s look at some areas it will start to hit first like directories. Well established quality directories like botw.org and the yahoo directory are and will remain well indexed ([site:botw.org] and [site:dir.yahoo.com]). However some of the mid level and lower level directories are going so start dropping pages. When these pages start falling out of the index so will all the backlinks on them. Who’s going to be another candidate that gets hit hard, article directories. If you look at EzineArticles and IdeaMarketers you’ll still see they are fairly well represented ([site:ezinearticles.com] and [site:ezinearticles.com]) but lots of others article directories won’t be. What does that mean less pages indexed = less back links indexed.

So what are some strategies going forward, well 4 out 5 members from overly obvious department say write quality compelling unique link-worthy content, and actively promote your website with methods that aren’t dependent on free search engine traffic for a viable business model. Here are a few other tips to consider:

Google Sitemaps: If you have a good reasonably clean website there’s no compelling reason not to use Google sitemaps. The information you can get from it is helpful and it may even help you gain an ever so small minuscule amount of trust. The most luck I’ve had with Google sitemaps is on websites of under 100 pages.

Improve Your Architecture: The need for well thought out and planned architecture has never been greater. Organize things into logical categories that make sense for users, go wide and not deep. For heaven’s sake put up a site map already, and interlink different areas of your site as much as possible. Don’t use nofollow to manipulate your PageRank so your have a PR7 homepage and everything else is a PR1 it’s not the end result you want.

Get the Deep Links: Stop trying to control how people come to your site, let them link to where ever they want, in fact encourage it. My most well indexed site has about 70% of it’s links to pages other than the homepage.

Blackhat Strategies: Ok this sections for you churn and burn folks, the rest of you move along. Making money from those 5,000 or 50,000 pages of autogen content is harder now and may require a shift. Again I think the key here is thinking wide, combine all those short 100 word pages together, and build a massively long page connected right off the homepage. Let’s be honest you’re not concerned with usability, in fact bad usability is good, it makes them more likely to click on an advertisement and leave. How about we start thinking about uber-microsites and one page websites. You don’t have to worry about a one page website being deep crawled and indexed, it’s all or nothing. What about keeping domain costs down … well friend subdomains can be your friend …

While it is possible all of this may be a temporary glitch and everything may return to normal, but I’m not betting on it. I learned my lesson from the sandbox and and am starting to take a much more proactive role to counteract the new big daddy crawling method. I’m not spending time debating about how ‘right’, ‘wrong’, or ‘unfair’ it is. Plain and simple it’s not going to line my pockets. I’m doing what I can to get ahead of the curve.

Related Information

Sphere It

Text Link Ads


14 Responses to “Superficial Crawling SEO Strategies”

  1. Andy H Says:

    Awesome post man.. and right on.

  2. alex Says:

    great article! I agree with you in “a great architecture is now the most important thing”.

  3. Cary Says:

    One of my sites got hit badly by Big Daddy, while my others continue to do well… it’s driving me batty, I’m still trying to figure out what it is about this one site that has caused its demise.

    I already use SiteMaps, but perhaps I need to work on my site architecture and deep-linking.

    Of course, getting deep-links isn’t all that easy to do these days…

  4. Phantombookman Says:

    An excellent post, I suspect in time to come your name will be mentioned whenever the ‘crawling sandbox’ is mentioned.

    I hope it’s temporary but I suspect you have coined a term for a new problem that will plague a lot of people

  5. BooTCaT Says:

    Hey ,u r right on the Uber Microsite and The one page site , ( ALL or nothing ) .
    I have even seen Seth Godin follow it .

    Nice Article

  6. Matt McGee Says:

    Great post, Michael. I’m curious about your suggestion that a “miniscule amount of trust” may come from using Sitemaps. I’ve seen other posts (also by people I respect) suggesting there may be some small benefit to using Google Analytics.

    Is there any consensus growing that Google would reward, even slightly, sites that use Google’s own tools? I suppose the reward could be better crawling or better ranking…..

  7. shandyking Says:

    Based on your assessments, it sounds like long sales letter sites are the way to go.

  8. Trisha Says:

    Luckily my sites don’t seem to have been hurt by Big Daddy, but mine still have a long ways to go anyway in terms of getting traffic, rankings, etc. On one I haven’t added any new pages for quite a while but need to soon, I wonder how long it will now take for the new pages to be spidered? It should be interesting anyway to see what happens.

  9. John Andrews Says:

    Fifty billion half-empty directories and 40 million affiliate shopping carts of the same “marital aids” are currently suppressed by Google. They will soon rise again, under the new footprints laid out by Graywolf in June of 2006.

    Is the “thought leadership” opportunity really worth all the hassle?

  10. ron angel Says:

    Very Interesting I dont agree with everything you say (can I say that…) but 90% ok but then again I am not expert in this field only working from eXperence and luck!

  11. Aaron Pratt Says:

    Me watches Michael’s Alexa rank currently drop to 18k. :)

    To get an indicator of all of the above you do not even need to follow any of the links, or read any posts, or visit any blogs, or forums, just spend a week in Google Sitemaps Group for the BIG picture.

    What Michael says is pretty much accurate give or take a few.

  12. Scott Corbett Says:

    As an ecommerce store owner, I’m chagrined by the possibility of shallower crawling. If Michael’s observations are accurate, it seems as if my homepage and category pages may still be crawled regularly but my product pages may not be. This will definately hurt me because of course the product pages are in less competitive spaces and I depend on them achieving higher rankings (albeit for the long-tail terms) sooner than upper-level pages. I’ll continue to use sitemaps, but my sites are well in excess of 100 pages, so the benefit may be limited. Hmmm.

  13. flyboy Says:

    I still can’t figure out why none of you guys understand the logic behind the sandbox.

    You make a website, it doesn’t get indexed immediately (like it does at MSN search), what is your only option….buy adwords….right.

    Well wait a minute, if that’s true then that’s illegal….let’s go ask Google if the sandbox really exists.

    Google response: “The sandbox does not exist, it never existed, it never will.”

    Is it really that hard for you guys to see ???????? Jeeze, loss of internal pages just means more forced adwords buys. Get it?

  14. Taryn Rose Says:

    Sandbox sure do exist. One of my website have all pages labeled supplemental. Its a general tutorials website. Don’t understand why would Google sandbox it.