Google Analytics and Search Engine Spiders
June 9th, 2007 by Michael Gray in GoogleIf you're new here, you may want to subscribe to my RSS feed. Read my top posts or learn more about Michael Gray. Want more frequent updates follow me on Twitter. Thanks for visiting!
Holy Crap how does Google Analytics not track/report search engine spider visits? I mean like duh isn’t that web analytics 102. I mean yes I want to know what the people are doing first, since Googlebot doesn’t have a credit card (yet), but I want to know what the spiders are finding second, because if the spiders never find it the people never will.
Yes I get the GA works via Javascript but what ever happened to programing that fails gracefully instead of falling flat on it’s face when conditions don’t fall within some narrow predetermined scope. If you’re going to tell me that “I’m not your average user and regular people don’t look for
Sphere It










June 9th, 2007 at 5:58 am
umm, that’s what log files are for! No one should rely on just a script-based service like Google-Analytics for such basic info.
June 9th, 2007 at 7:23 am
JS based analytics tools have never tracked bots.
I suppose theres also an argument that Google has no interest in telling you about bots, too
June 9th, 2007 at 7:28 am
The answer to this question lies in whether spiders interpret noscript tags or not.
June 9th, 2007 at 9:17 am
I’ve always been a log file guy because IMHO you get more data out of the application. However I recently started looking at GA with more interest, and wow is this a glaring deficiency. There’s nothing wrong, bad, deceitful, or “gaming the system” about wanting to know if the bots are crawling your site properly or not. This is really a glaringly obvious flaw in the program implementation.
June 9th, 2007 at 10:10 am
It’s not just with GA. No Javascript-based analytics tools report on bots.
I dunno - I suppose it wouldn’t be that hard for Google to figure out when their own bot visits a page, but they’d have to develop a way to gather data from other bots too.
Since there isn’t a noscript tag in place with the current GA implementation, they’d have to rollout something new to everyone. A single-pixel image would do the trick, I suppose, but that’d mean everyone using GA would have to implement new code.
June 9th, 2007 at 10:32 am
They could always make it so the existing code works, but for people who do want to upgrade and get the data give them the option. I really am dumbfounded that the entire industry is growing up with JS as the preferred implementation and it’s so completely lacking.
June 9th, 2007 at 11:15 am
Maybe in future they will offer enough data in Webmaster Central so it isn’t an issue?
June 9th, 2007 at 2:27 pm
I think the only way Google Analytics could track bots is to add an invisible image in a noscript element.
I just grep the raw logfiles. With a simple one-line shell script you can create a spreadsheet of all search engine bot activity and sort the results by response code, e.g., “show me all 302s that Googlebot encounters”. I’ll post an example later.
June 9th, 2007 at 4:37 pm
Actually, I don’t think they could even track bots with an image because the crawlers wouldn’t request the image for each page. The only way Google Analytics could reliably capture bot data is to run server-side.
June 9th, 2007 at 6:10 pm
A while ago there was a bit of a kerfuffle about the accuracy of GA. (I’m pretty sure it is the Michael Martinez article on SEOMox that I am thinking of - http://www.seomoz.org/blog/how-reliable-is-google-analytics ) anyway, Google actually responded (wow!) that their JS aproach was superior to a log based approach because they didn’t collect duplicate views (? IIRC)
So, thats interesting - Google thinks a log based approach is less accurate than theirs.
(Personally, I use both, combining data to get a true picture)
I don’t think there is anyway to reliably catch bots as a 3rd-party - generally, bots only download the html. No images, no scripting. I don’t thinkeven a noscript would get them. You can only catch them on the server. (And no way am I letting Google or anyone in there!)
June 10th, 2007 at 3:57 am
You could tweak Google Analytics to gather spider data too, if you put a function in your header call to open the 1×1px image it would otherwise open through javascript. I’ve tried this and it works. For more info check out Peter van der Graaf’s blogpost about it:
http://www.vdgraaf.info/google-analytics-without-javascript.html
As long as you’re doing it through javascript it will never ever report bots that don’t do javascript…
June 11th, 2007 at 1:57 am
The majority of bots don’t grab images, either - just the raw html - so that’s not going to be very effective
June 11th, 2007 at 2:14 pm
That’s not what I meant, I meant you should put all the data in the grab of the image yourself, and grab the image from your server before you send out the page. So the bot doesn’t grab anything else than the HTML.
June 12th, 2007 at 9:01 am
Thats an interesting approach, Joost (now that I figured out what you mean! :))
Have you tried this?
Is it still valuable, despite the lack of special-bot handling reports on the Google back end?
June 13th, 2007 at 1:08 am
Before talking smack about Google Analytics, what of HBX or Omniture? I’m not aware they track spiders with any sort of accuracy, since it is also a JavaScript based tool.
Any input from anyone?
We cant single out GA, if a $10k a year tool has similar difficulties, which I think it does. Is this correct?
June 13th, 2007 at 1:35 am
Daniel R - that point has already been made.
No matter what the cost or value otherwise, a Javascript-based tool can’t report on bot traffic.
While there were a couple of other ideas above, including an invalid short-lived suggestion of my own, the fact remains that the only method for retrieving bot traffic reports is either via logs or, as I do when possible, is direct database hit logging of bot visits. Logging a bot hit into a DB allows for querying against those DB records for later comparison against real rankings and traffic levels.
June 15th, 2007 at 9:26 pm
Yeah…
There would be no way to do this without executing javascript.
I’ve thought about an opt-in mechanism for spiders to report their indexing to 3rd party tracking systems but I’d like to see one of the vendors pick it up.
This would be a feature we could ship with Spinn3r and since we crawl on behalf of our customers we might be able to get more leverage to push adoption.
June 18th, 2007 at 4:48 pm
Joost pointed me to this post and let me say you can use Google analytics to report on server-side in stead of client-side visitors.
The only problem is that server side visits like a search bot will distort your client-side (javascript) collected reports. So create a separate profile before you collect anything.
As long as your web server supports a language like php, perl, asp and other, you can request the Google analytics tracking image from a script without having to show it on the webpage. The web server does the request, so your script needs to collect all bot data it wants to send. You can give the bot data to any of the variables that is linked to the image request.
I have no immediate use for a bot tracker, but everyone is more than welcome to customize my script to make it do just that. I promise you it can’t be hard.
http://www.vdgraaf.info/google-analytics-without-javascript.html
July 12th, 2007 at 10:18 am
Scary how this post is useless!
October 8th, 2007 at 2:38 pm
Google Analytics is great BECAUSE it doesn’t track bots and spiders. Granted, for SEO, you need to know what the bots are doing. GA doesn’t try to replace that function. Instead, GA is for tracking real visitors and their interactions with your site. Log files or services other than Google Analytics are plenty for tracking robots and spiders.
Also, this is not a flaw. Spiders don’t run JavaScript, so no JavaScript-based analytics will report them. (Yes, already said above, and the work arounds for tracking them are noted.)
I had a client that, for argument’s sake, was getting 100 visits per day according to his log file-based “analytics”. He then asked why Google’s analytics were “so wrong”, only reporting 20 visits per day. I had to burst his bubble and tell him he was only getting 20 visitors per day and that the rest were bots. The point is, he had 20 real people visiting his site each day. Those people might actually convert and buy something. The 80 visits by bots never will.
October 8th, 2007 at 6:25 pm
@jason real people are important but 9 times out of 10 if the bots aren’t finding you neither are the people.
October 15th, 2007 at 3:20 pm
@Michael, I absolutely agree with you that both people and bots are important. However, this is not what Google Analytics is for, and criticising it for something it was never trying to do is just wrong. Especially when your post starts with “Holy crap…”. I think that’s the only reason I felt the need to post here. You didn’t give GA a fair trial.
You need to use a log parser to analyze your bot traffic. Then, if you want an excellent picture of what your live visitors are doing, you should use a JavaScript-based analytics tool like Google Analytics. Together, you will get all the information you need to start making great decisions.
If I must make an analogy, saying that Google Analytics is a failure because it can’t track bots is like saying coffe mugs are failures because they can’t cut steak. You might be able to get it done using a hack like we’ve seen above, but that’s just not what it was designed for.