Google Analytics and Search Engine Spiders

June 9th, 2007 by Michael Gray in Google


If you're new here, you may want to subscribe to my RSS feed. Read my top posts or learn more about Michael Gray. Want more frequent updates follow me on Twitter. Thanks for visiting!

Holy Crap how does Google Analytics not track/report search engine spider visits? I mean like duh isn’t that web analytics 102. I mean yes I want to know what the people are doing first, since Googlebot doesn’t have a credit card (yet), but I want to know what the spiders are finding second, because if the spiders never find it the people never will.

Yes I get the GA works via Javascript but what ever happened to programing that fails gracefully instead of falling flat on it’s face when conditions don’t fall within some narrow predetermined scope. If you’re going to tell me that “I’m not your average user and regular people don’t look for ” I’m going to smack you really hard, because I’m really getting tired of that response, I don’t aspire to be mediocre or average, and neither should you.

Sphere It

Text Link Ads


22 Responses to “Google Analytics and Search Engine Spiders”

  1. Tim Says:

    umm, that’s what log files are for! No one should rely on just a script-based service like Google-Analytics for such basic info.

  2. Lea Says:

    JS based analytics tools have never tracked bots.
    I suppose theres also an argument that Google has no interest in telling you about bots, too

  3. Ovidiu Says:

    The answer to this question lies in whether spiders interpret noscript tags or not. ;)

  4. Michael Gray Says:

    I’ve always been a log file guy because IMHO you get more data out of the application. However I recently started looking at GA with more interest, and wow is this a glaring deficiency. There’s nothing wrong, bad, deceitful, or “gaming the system” about wanting to know if the bots are crawling your site properly or not. This is really a glaringly obvious flaw in the program implementation.

  5. Tim Says:

    It’s not just with GA. No Javascript-based analytics tools report on bots.

    I dunno - I suppose it wouldn’t be that hard for Google to figure out when their own bot visits a page, but they’d have to develop a way to gather data from other bots too.

    Since there isn’t a noscript tag in place with the current GA implementation, they’d have to rollout something new to everyone. A single-pixel image would do the trick, I suppose, but that’d mean everyone using GA would have to implement new code.

  6. Michael Gray Says:

    They could always make it so the existing code works, but for people who do want to upgrade and get the data give them the option. I really am dumbfounded that the entire industry is growing up with JS as the preferred implementation and it’s so completely lacking.

  7. BlogStorm Says:

    Maybe in future they will offer enough data in Webmaster Central so it isn’t an issue?

  8. PocketSEO Says:

    I think the only way Google Analytics could track bots is to add an invisible image in a noscript element.

    I just grep the raw logfiles. With a simple one-line shell script you can create a spreadsheet of all search engine bot activity and sort the results by response code, e.g., “show me all 302s that Googlebot encounters”. I’ll post an example later.

  9. PocketSEO Says:

    Actually, I don’t think they could even track bots with an image because the crawlers wouldn’t request the image for each page. The only way Google Analytics could reliably capture bot data is to run server-side.

  10. Lea Says:

    A while ago there was a bit of a kerfuffle about the accuracy of GA. (I’m pretty sure it is the Michael Martinez article on SEOMox that I am thinking of - http://www.seomoz.org/blog/how-reliable-is-google-analytics ) anyway, Google actually responded (wow!) that their JS aproach was superior to a log based approach because they didn’t collect duplicate views (? IIRC)
    So, thats interesting - Google thinks a log based approach is less accurate than theirs.
    (Personally, I use both, combining data to get a true picture)

    I don’t think there is anyway to reliably catch bots as a 3rd-party - generally, bots only download the html. No images, no scripting. I don’t thinkeven a noscript would get them. You can only catch them on the server. (And no way am I letting Google or anyone in there!)

  11. Joost de Valk Says:

    You could tweak Google Analytics to gather spider data too, if you put a function in your header call to open the 1×1px image it would otherwise open through javascript. I’ve tried this and it works. For more info check out Peter van der Graaf’s blogpost about it:
    http://www.vdgraaf.info/google-analytics-without-javascript.html

    As long as you’re doing it through javascript it will never ever report bots that don’t do javascript…

  12. Lea Says:

    The majority of bots don’t grab images, either - just the raw html - so that’s not going to be very effective :(

  13. Joost de Valk Says:

    That’s not what I meant, I meant you should put all the data in the grab of the image yourself, and grab the image from your server before you send out the page. So the bot doesn’t grab anything else than the HTML.

  14. Lea Says:

    Thats an interesting approach, Joost (now that I figured out what you mean! :))
    Have you tried this?
    Is it still valuable, despite the lack of special-bot handling reports on the Google back end?

  15. Daniel R Says:

    Before talking smack about Google Analytics, what of HBX or Omniture? I’m not aware they track spiders with any sort of accuracy, since it is also a JavaScript based tool.

    Any input from anyone?

    We cant single out GA, if a $10k a year tool has similar difficulties, which I think it does. Is this correct?

  16. Tim Says:

    Daniel R - that point has already been made.

    No matter what the cost or value otherwise, a Javascript-based tool can’t report on bot traffic.

    While there were a couple of other ideas above, including an invalid short-lived suggestion of my own, the fact remains that the only method for retrieving bot traffic reports is either via logs or, as I do when possible, is direct database hit logging of bot visits. Logging a bot hit into a DB allows for querying against those DB records for later comparison against real rankings and traffic levels.

  17. Kevin Burton Says:

    Yeah…

    There would be no way to do this without executing javascript.

    I’ve thought about an opt-in mechanism for spiders to report their indexing to 3rd party tracking systems but I’d like to see one of the vendors pick it up.

    This would be a feature we could ship with Spinn3r and since we crawl on behalf of our customers we might be able to get more leverage to push adoption.

  18. Peter van der Graaf Says:

    Joost pointed me to this post and let me say you can use Google analytics to report on server-side in stead of client-side visitors.
    The only problem is that server side visits like a search bot will distort your client-side (javascript) collected reports. So create a separate profile before you collect anything.

    As long as your web server supports a language like php, perl, asp and other, you can request the Google analytics tracking image from a script without having to show it on the webpage. The web server does the request, so your script needs to collect all bot data it wants to send. You can give the bot data to any of the variables that is linked to the image request.

    I have no immediate use for a bot tracker, but everyone is more than welcome to customize my script to make it do just that. I promise you it can’t be hard.

    http://www.vdgraaf.info/google-analytics-without-javascript.html

  19. Pascal Says:

    Scary how this post is useless!

  20. Jason Green Says:

    Google Analytics is great BECAUSE it doesn’t track bots and spiders. Granted, for SEO, you need to know what the bots are doing. GA doesn’t try to replace that function. Instead, GA is for tracking real visitors and their interactions with your site. Log files or services other than Google Analytics are plenty for tracking robots and spiders.

    Also, this is not a flaw. Spiders don’t run JavaScript, so no JavaScript-based analytics will report them. (Yes, already said above, and the work arounds for tracking them are noted.)

    I had a client that, for argument’s sake, was getting 100 visits per day according to his log file-based “analytics”. He then asked why Google’s analytics were “so wrong”, only reporting 20 visits per day. I had to burst his bubble and tell him he was only getting 20 visitors per day and that the rest were bots. The point is, he had 20 real people visiting his site each day. Those people might actually convert and buy something. The 80 visits by bots never will.

  21. Michael Gray Says:

    @jason real people are important but 9 times out of 10 if the bots aren’t finding you neither are the people.

  22. Jason Green Says:

    @Michael, I absolutely agree with you that both people and bots are important. However, this is not what Google Analytics is for, and criticising it for something it was never trying to do is just wrong. Especially when your post starts with “Holy crap…”. I think that’s the only reason I felt the need to post here. You didn’t give GA a fair trial.

    You need to use a log parser to analyze your bot traffic. Then, if you want an excellent picture of what your live visitors are doing, you should use a JavaScript-based analytics tool like Google Analytics. Together, you will get all the information you need to start making great decisions.

    If I must make an analogy, saying that Google Analytics is a failure because it can’t track bots is like saying coffe mugs are failures because they can’t cut steak. You might be able to get it done using a hack like we’ve seen above, but that’s just not what it was designed for.