I love wordpress I really do it makes it really easy to publish, however the wordpress developers really need some help sometimes. It seems when there is a choice to make things SE friendly more often than not they make worst choice possible.
Case and point let’s use one of my favorite test subjects Matt Cutts. Let’s take a look at this SERP for example.

The one that concerns me the bottom one www.mattcutts.com/blog/harry-potter-font-looks-like-yahoo-logo/feed/
For every post wordpress generates a comment feed url. Why does this matter? Well Google will index these feed URL’s and up until a day or two ago these would be labeled “supplemental”. From Google’s perspective it makes sense, because comment feeds are distinct URL’s and 99% of the time don’t get links. Why is that a problem well without a “critical mass” of authority/trust, sites having a large amount of supplemental pages is considered a negative indicator and a gets a subsequent ranking drop/filter.
To make things worse there’s no easy way to get into wordpress and fix the problem. You can’t specify a different permalink folder for feed comments without hacking the core wordpress files. You can’t add a “noindex” meta tag either. So why not block the “feed” folder using robots? The majority of people publish feeds under http://example.com/feed/ so using robots keeps your comment feeds from being indexed also blocks your main site feed, not the desired scenario.
So how do we fix it? First Google give us back the supplemental indicator, it’s a diagnostic tool, and don’t suggest the two step workaround: Google Dumps The Supplemental Results Label
First, get a list of all of your pages. Next, go to the webmaster console [Google Webmaster Central] and export a list of all of your links. Make sure that you get both external and internal links, and concatenate the files. Now, compare your list of all your pages with your list of internal+external backlinks. If you know a page exists, but you don’t see that page in the list of site with backlinks, that deserves investigation. Pages with very few backlinks (either from other sites or internally) are also worth checking out.
You don’t improve things by making them harder. Secondly wordpress get an SEO consultant on board to keep you from shooting yourself in the foot on a regular basis. Try looking at your blog and see if the comment feeds are getting indexed, if they are I’d suggest biting the bullet and blocking all feeds, not ideal, but clearly the lesser of two evils.
Related posts:
- Category SEO For Wordpress Blogs and Ecommerce I solved (
- Thesis is a Wordpress Framework Not a Design and Why That Matters for SEO When I men
- Looking for a YouTube Wordpress Plugin As both Yo










{ 17 comments }
Michael – is it possible to use a wildcard in the robots.txt file to block everything that ends with /feed/? Like this:
Disallow: */feed
Disallow: */trackback
Does that work?
I agree, Matt – simple to fix:
User-agent: Googlebot
Disallow: /*/feed/
You could also try a daring move and add
Disallow: /*/
to get rid of URLs ending with “/” (make sure to check the template / settings that the URLs are also being linked that way).
It would be really neat to have the community do a SEO review of Matt’s site – I bet some nice gems would come out and even if Matt doesn’t implement them all, perhaps it will give him (for his site) and Google (from the tips) some ideas on what could be changed further down the line. I’m sure a lot of myths could be debunked at the same time. Win-win
Both google and yahoo support wild cards in robots.txt
my wordpress robots.txt looks like this
Disallow: /*/feed
Disallow: /*/trackback
Disallow: /wp-login.php
it works well although it does block the main /comments/feed.
Duh (sorry)! Instead of
Disallow: /*/
use
Disallow: /*/$
John
I went through the thousands of pages of dupe content on John Chows site the other day and it has all of these issues and many more. The Wordpress team needs to sit down and read some of the articles about “SEO for Wordpress”.
http://www.blogstorm.co.uk/blog/duplicate-content-john-chow/
Yea I just blocked both feeds. All that dup content on there is not worth the risk.
As you state feeds are not usually linked to, except on the site itself. I’ve not only blocked the feeds in robots.txt but also nofollowed the links to them.
The reason I did it is as you pointed out with Matts blog, I hate it when I run into a feed url and click it. I want a web page, not a feed page. So for the sake of my potential searchers I blocked and nofollowed the links to them. If that has some sort of side effect of helping the site keep more real pages in the main index that’s great too. A blog with hundreds of posts could potentially have hundreds of comments feeds listed as well.
Blocking the feed from being indexed is important but also just as important is not sending the crawlers there in the first place. IMHO.
Michael I had the problem of Google indexing my feed about a year ago. I went the robots.txt route and within a week everything was back out of supplemental.
My experience
My addition to robots.txt was
User-agent: Googlebot
Disallow: /*/feed/$
Disallow: /*/feed/rss/$
Disallow: /*/trackback/$
I’m wondering why you wouldn’t want to block the feed. Google still indexed the posts. With the feeds they were indexing the straight XML.
What’s the downside to using robots.txt?
I don’t think I’ve ever seen the trailing dollar sign — what does it do?
Matt the trailing dollar sign is to match the end of the line
Good find Michael .
Kris from Florida
P.S. Could you post more often ?
I usually block Googlebot from all feeds except the main feed (as #2 mentioned). Some search engines want your feed though. This would block all WordPress feeds except for the main feed (example.com/feed/):
Disallow: /*/feed/
If you use this you can get into trouble:
Disallow: /*/$
For example, if you later decide to create non-WordPress content and put it in a subdirectory, the main page of that other content would be blocked. E.g., http://example.com/forum/
A slash in *nix is the symbol for a directory so even if you request http://example.com/forum, the server should correct you and add the trailing slash.
If you’re talking about Google only, then you can use the new X-Robots-Tag header mechanism:
http://www.webmasterworld.com/google/3407137.htm
The following is untested, but should work:
Open the file /wp-includes/feed-rss2.php
At the top you will find the line:
header(’Content-type: text/xml; charset=’ . get_option(’blog_charset’), true);
Before, add:
header(’X-Robots-Tag: noindex’);
Same goes for feed-atom.php etc. Use the Firefox Live HTTP headers extension to check.
This is a really interesting insight. I recently chose Drupal over Wordpress and now I’m glad, as Drupal publishes just one feed per blog.
Now here’s a related issue: when you have multiple blogs per domain, and hence multiple feeds, does the same penalization happen, assuming you don’t have a major authority?
In otherwords, are the 6 other blogs belonging to individuals relegated to supplementary content?
To NOINDEX the feed just use this plugin: http://www.joostdevalk.nl/code/wordpress/noindex-feed/
For SEO puropses just use the super-duper All in One SEO Pack: http://wordpress.org/extend/plugins/all-in-one-seo-pack/
Yep yep!
X-Ride: right now that plugin will noindex all your feeds. The next release of WordPress will hopefully contain a hook which allows me to make a plugin that just noindexes the comment feeds.
I like those last options X-ride. Thanks – I am on my way there now.
Comments on this entry are closed.