Looking at Google Sitemaps
Posted on June 13th, 2006by Michael Gray in Google, SEO
If you're new here, you may want to subscribe to my RSS feed. Read my top posts or learn more about Michael Gray. Want more frequent updates follow me on Twitter. Thanks for visiting!
A few weeks ago Aaron asked me to look at and explain some things in Google Sitemaps, and although it’s been a few weeks I did eventually get around to it (see SeoBuzzBox.com). I’ll get into looking at things in a little later I’d like to start by looking at ways to improve Google Sitemaps.
I’ve seen the presentations that some Site map Engineers have given at conferences, and heard you on podcasts( via Good Karma mp3 format) and you seem to “get it” so these are meant in spirit of being constructive, and not a dig.
Establish a Dialogue: As a webmaster or web publisher the reason we sign up for sitemaps is to establsih a dialogue with google. We’re willing to say OK we own these website(s) and would like to take our relationship and communication to the next level.
Preferential Crawling: That’s right I said preferentail crawling. The days of tweaking your meta-tags, submiting them and checking to see how things changed 24 hours later are long gone, but there still are things we can do to make things better. There’s one thing blogosphere has right search engines don’t and that’s the blog-and-ping model. When you update a your blog or put up a new post you can “ping the blog services” to tell them you have new content and they come a get it. The search model is the equivilent of waiting in line at DMV. Yes we can ping you with a new sitemap but it doesn’t call for a crawl. Also how about giving us the ability to call for a full or deep crawl? You could limit it to 4 or 6 calls a year per site to keep it manageable for really large sites.
Faster Reporting: How about improving how quickly the stats are updated. I have errors that have been fixed for almost two weeks that you still show as errors. This is the 21st century and we’re all and impatient bunch, so when I fix something let me tell you it’s fixed you can check it let me know right away, instead of waiting two weeks.
Technical Diagnosis Tools: There are lots of reasons that can cause a site not to rank, some of them are technical and not algorthymic. Giving us the ability to diagnose the technical errors and fix them would be a huge leap over the competition and go a long way to getting wider adoption of the program. (see Google Mis-Matching Domain Content on Shared IP’s).
For this discussion I’m going to be showing some screen shots, most of them will be for this domain, but I will also show a few others if it’s interesting. Let’s look at the main screen

Looking at the top line we see the following piece of information “Some of your pages are partially indexed” which takes us to this page which gives this bit of information:
The Google index contains two types of pages: fully indexed and partially indexed pages. A site that’s listed by its URL and appears without a cached copy and a detailed title is partially indexed. When a site is partially indexed, it’s because our robots were unable to completely review its content during a recent crawl.
Not especially clear on what the problem is, it’s most likely because I have this in my meta tags:
<meta NAME="ROBOTS" CONTENT="NOARCHIVE">
However it would be nice to know that for certain. Next let’s look at the error pages

At the time of writing some of these error pages have been around for 2 weeks. I’ve used Dax’s handy-dandy Objection Redirection Wordpress Plugin to fix them but you still think they exist (see biggnutts.com)
Looking at my crawl stats I don’t see anything that looks to wierd

Here’s another two that looks slightly different


For those of you obsessed with green pixels you get a rough breakdown of how you page rank is distributed. The line “pagerank not yet assigned” doesn’t have an explanation but I’ll assume it’s pages that you “discovered” since the last published pagerank update. Next looking at page analysis we get a table of inbound anchor text, how about making it phrase based instead of word based

Is Google sitemaps for everyone, clearly no. However many site owners could benefit from getting data back from Google. With a few updates and a little more data feedback from the sitemaps program there could be some major advantages to integrating sitemaps.
Lastly if you’re running a wordpress blog it’s really easy to start using Google Sitemaps with the Google Sitemaps Wordpress Plugin from Arne Brachhold.
Related Information
- Dear Michael Gray SEO Buzz Box
- Good Karma Podcast with Vanessa Fox - via Webmasterradio.fm mp3 format
- Google Mis-Matching Domain Content on Shared IP’s
- Biggnuts » Objection Redirection!
- Inside Google Sitemaps Blog
- Google Sitemap Generator for WordPress v2
- Google Sitemaps - Welcome!
- Google Sitemaps Catch 22











June 13th, 2006 at 10:43 pm
Sitemaps is deinitely slow to update its reports.
I’m getting file not found http errors that must be coming from a cached page. I eliminated a segment of my site, along with the links to it, BEFORE it was crawled. Google must be crawling from a cached page containing the old links.
Also, did you notice the sitemaps disclaimer that you can get a slot on the “http errors” page if Gbot follows a bad link that is not even from your own site?
June 13th, 2006 at 11:07 pm
>Also, did you notice the sitemaps disclaimer that you can get a slot on the “http errors†page if Gbot follows a bad link that is not even from your own site?
Yes actually that’s a good thing, if you can use mod_rewrite or something else you can capture the bad incoming link and redirect it someplace useful
June 13th, 2006 at 11:38 pm
Thanks Michael, I agree with the great need to “establish a dialogue” thing 100%, I have been trying to do this with Google sinse 2003 when my first site was getting hosed from canonicals or what I perceived as such.
I also believe if something in sitemaps can not be fully explained by G it should be removed. This type of stuff can drive an admin. nuts!
But Vanessa Fox from sitemaps did reboot something in sitemaps crawl stats that I was complaining to her about. If you also spend time in Google Sitemaps Groups you have direct acccess to the G-team so things are getting better.
In other news seobuzzbox.com is showing the weakest and strangest stats I have ever seen, I am wondering if it’s got some sort of “SEO” deduction…or that deleting lots of content deflated it’s site flavor.
Oh well.
June 14th, 2006 at 4:42 pm
Michael,
Kudos on that thoughtful writeup.
I especially like your note about how the Sitemaps services are designed to “establish a dialogue” between Google and Webmasters. This is spot on.
We have an amazing Sitemaps team (with Amanda and Vanessa being the most public-facing) that’s dedicated to taking in feedback, participating on the Sitemaps Google Group and improving the services offered over time.
Google’s very strongly committed to extending these Webmaster tools and communications over time. Lots of good stuff to come!
Thanks again for the writeup, and take care.
June 14th, 2006 at 6:52 pm
Some of us probably fear “exposure” of our network.
But I do thank you for this. I submitted four sitemaps and was surprised to discover that the sites hadn’t - overall - been visited in a week despite seemingly decent Google traffic.
June 15th, 2006 at 5:39 am
Is anyone else not having the ‘anchor text’ or ‘highest pagerank’ bits show up in their sitemap accounts?!
June 20th, 2006 at 3:35 am
A definitive post, many thanks indeed
June 24th, 2006 at 9:35 pm
Just did an interview with Sitemaps and seems like they came out with some great new stuff, but they should get you to write a webmaster friendly instruction manual.
July 20th, 2006 at 5:47 pm
We see the same problem in Denmark - often way too long before reports are updated. But… it seems to be little faster now than a few month ago.
Maybe soon…