I’m a big fan of the Google Webmaster Central Program and using sitemaps. I agree that you should build your website so that it is crawlable and not rely on sitemaps to compensate for poor site architecture, but hands down there is no better tool when you are migrating or cleaning up after a site migration than webmaster central. However there’s a dark side to webmaster central that I haven’t seen anyone else bring up.
First we need to dive a little deeper into webmaster central and sitemaps. One of the subtle features of webmaster central is the priority tag in the XML file. You are allowed to specify a value from 0.1 (lowest) to 1.0 (highest) for your pages. Now some people try to “trick” the search engines giving all of their pages a 1.0 thinking this will result in a SEO benefit. This is completely incorrect, as they are scale values. If all of your pages are 1.0 you have told Google they are all equally important, and that there is no priority. So you are much better off playing by the rules on this one.
Some sites generate XML files manually or use an automated tool to generate the pages interfacing with the CMS. If you are using wordpress you can get the Google Sitemaps Plugin to do this automagically for you. It even takes care of pinging Google and the other sitemaps programs from Yahoo and ASK as well. However out of the box the program uses number of comments as the priority calculator, which is well just bizarre, and almost always inaccurate. However what you can use for better results is the Alex King plugin called Popularity Contest to give you a more accurate priority calculation. Popularity contest uses number of page views to determine priority, not perfect but IMHO much better.
So here’s where the competitive intelligence comes in, people who are using automated solutions like popularity contest are telling you their most highly trafficked pages. Other people who are generating Sitemap XML files in a more manual fashion, are telling you the pages they want to rank. Chances are good the pages they want to rank for are the “money pages”. Now of course you could cloak … err … IP deliver your sitemap file, but IMHO that’s about as smart as trying to pick pockets at a police convention. So there you have it a little something extra to look at the next time you do some competitive analysis.
Related posts:
- Can You Get a Website Indexed with No Links and XML Sitemaps? This weekend I was doing a little housekeeping on some...
- Google Guide to SEO – Are Social Media Links on Death Row? I was reading Google’s guide to SEO (pdf) , and...
- Does Google Know What Websites You Own If you happen to know anyone who has been building...












{ 8 comments }
Instead of cloaking the sitemap file, why not just use an obscure name? If you manually submit the sitemap file you can use any name you want and nobody can see it (also set up a phoney sitemap.xml if you feel like being misleading
). In Google’s Webmaster Tools manually submitting it has the advantage of getting additional feedback: next to errors you’ll see the number of URLs in your sitemap which are indexed.
A good thing to always note when talking about sitemaps is that Google ONLY uses them for discovery, in other words: to find NEW pages. Having your old files in there and changing the priority is thus of no use whatsoever.
I would never say “ONLY”, Joost
. A sitemap file can contain lots of valuable information, for a search engine. One of the problems that a search engine will run into with regards to sitemaps is determining how useful the data is in real life. Have all of your pages really been modified today? Are all of your pages really “the most important”? It’s hard to see what a search engine does with the data provided and you never know what may happen in the future, provided we can trust your data.
I’ve always thought of a sitemap as a red carpet for Search Engines… leading them to where you’d like them to be and doing it easily.
I’ve always found that the XML sitemap will not help pages get indexed if the site architecture behind those pages is not sound. That being said, I’m always up for gathering additional competitive intelligence
I use a sitemap generator. I have customized it to the point that on one of my sites the 3,000+ pages are accurately rankend in the xml file. The one thing I have learned is that priority has little to do with much. It seems that some of my low priority pages do better than 0.9 or 0.8 sites. A question for you is I have seen more and more sitemaps move to a .xx format and even seen one go to .xxx. What is the point of breaking down that much.
Interesting idea… You can often find people’s sitemaps with Google’s ’site’, ‘inurl’, and ‘filetype’ operators. You can probably hide your sitemaps from those kinds of queries with an X-Robots-Tag noindex HTTP header. I have a feeling that Google just uses XML sitemaps for their own benefit to debug Googlebot…
Interesting idea yeah, nice one. I have only used sitemaps in the past in the normal fashion, and like your one commenter Douglas, have also thought of them as a ‘red carpet’ for google.
If the sitemap is updated as new pages are created or changed, surely you are working with google to keep them updated with your site content. I personally believe that the more a website is working with Google, like with webmaster tools, analytics, adsense etc, the more google likes it. I of course have no hard evidence, but you must get some brownie points.
Comments on this entry are closed.