Michael Gray

Yes Virginia You Can Hurt Yourself With Duplicate Content

Posted on September 16th, 2008
by Michael Gray in SEO



So last week google had a blog post entitled Demystifying the “duplicate content penalty” and if you haven’t read it I’d recommend doing so. However I’m going to disagree with some parts of the post, because IMHO you really can hurt yourself with duplicate content.

I’m going to start off with a story to make my point … lets imagine it’s a bright sunny afternoon and I’m outside working on my car. I’m watching my 8 year old nephew, he’s a precocious young lad, but unless you give him clear instructions he likes to do things his own way. As a result he sometimes makes mistakes, but he means well.

I’m under the car and ask him to go into the house and get me a flashlight. Instead of going through the front door which is 20 feet away he runs around the back of the house, goes in the back door to the kitchen goes in the “junk drawer” and brings out the small “quality challenged” blue plastic flashlight with “#1 Uncle” written on the side that he gave me last Christmas. Ok technically he got me flashlight, but it really wasn’t what I was looking for. However I didn’t give proper directions, so it’s partly my fault, so we’ll try again. This time I tell him to go in the front door, go down the hall, open the closet and get the flashlight inside. He goes in the front door and down the hall to the closet and gets the first flashlight he sees. He comes out with my 4-D Cell Maglite. Ok it’s better than the #1 uncle flashlight, but what I really wanted was my mini maglite, because the 4 cell maglite is just too big too work with under the car …

Whats the point of this story … lets pretend that instead of my house we’re talking about my website. Then instead of my well intentioned nephew lets pretend we’re talking about the google bot. And instead of looking for a flashlight what we’re really looking for is content on my website. Do you want it to be in one spot under one URL, or do you want it 6 categories, 4 tag archives, and monthly, daily and yearly archives? Lastly do you want google to guess at which URL’s you want the content indexed under or would you much rather tell it where to look and what to ignore?

Hopefully you’ve just had a lightbulb moment, and realize this whole “let google sort it out” mentality is littered with numerous opportunities to go off the track. As a site owner the most important part of your site are the individual pages (or single post pages if you run a blog). When you leave it up to Google you are hoping they guess that’s what you wanted, while they do get it right in many cases, there are lots of times where they don’t. Here are some quick and easy ways you can fix that and help Google understand what it is you really want.

Don’t display full posts on your homepage or in your archives, only display the full content on the single post page. Personally I like to limit it to a few sentences, the first paragraph or wherever there is a natural break. You are ultimately writing for people, but that doesn’t mean you shouldn’t take steps to make the machines interpret it how you want.

Categories, Dates, or Tags, choose one and block google from crawling the others. I’m a big fan of using categories because it helps set up a SEO Siloing structure, but if you prefer tags or date based, go for it. Now notice I didn’t say eliminate the other navigation paths, I just said block them from being spidered, preferably using robots.txt (see mine for an example). Remember your readers may remember you made a kick ass post last Halloween and a date based archive is the quickest way from here to there.

Limit the number of paths Google can use to find your content. If you choose categories as your primary structure keep your posts in a few categories as possible. The more categories google can find the post under the more opportunity there is for them guess at the wrong one. However if you put your posts under only one category, you’ve given google only one choice, which makes it pretty hard for them to get wrong :-)

Finally lest someone think this was just a clever way I could sneak in few links to a new flashlight client, I like your thinking, but you’re barking up the wrong tree. I actually have about 8 different mag lights in the house in varying sizes. The oldest flashlight I bought in 1986 at my second job, working at the hardware store for $30. In 1986 $30 was a lot more money than it was today, but I can say it was money well spent, as I still have the flashlight and it still works 20 years later.

Popularity: 13% [?]

Sphere: Related Content

Text Link Ads


22 Responses to “Yes Virginia You Can Hurt Yourself With Duplicate Content”

  1. User GravatarReese Says:

    LOL at the flashlight client bit.

    To help keep things on TOPIC, here was a flashLIGHTBULB statement for me: “Do you want it to be in one spot under one URL, or do you want it 6 categories, 4 tag archives, and monthly, daily and yearly archives?”…

    While I’ve known for a while duplicate content is an issue, your illustration here explained it better than anything else I’ve read online.

    Speaking of categories and tags: when you’re running a site that has a lot of ‘brand’ based reviews or news on it, would you tag related to brands, or just do general categories? (Rough example: tech site that has stories on everything from the latest blackberry to a new Nvidia card that’s out. Would you add tags so you could get in tagging for the brands, or just let your title tags and content itself take care of that and keep it filed under categories like “video cards” and “pdas”)

  2. User Gravatarg1smd Says:

    I am in the middle of looking at a CMS, where the root page can be returned for more than 60 different URLs.

    The normal content pages have some 850 ways, or more, to reach each one.

    The sections have some different possible 50 URLs for every active section index page.

    The contact page also has 44 URLs that can reach it.

    So, take a site with the following features:

    1 x root page
    5 x section pages
    50 x pages (10 pages per section, whatever)
    1 x contact page

    That’s a site with 57 pages, and supposedly 57 URLs.

    In this CMS it exposes: (1 x 60) + (5 x 50) + (50 x 850) + (1 x 44) = 42 854 URLs.

    It is easily fixed with about 20, or so, block redirects to herd everything towards one format for each type of page, and a set of 5 rewrites to connect all of those URLs to the internal filesystem filenames. The much longer job is going to be modifying the CMS to make sure that everything works the same way.

  3. User Gravatarbrewgin Says:

    Just wanted to say thanks for the great example.

  4. User GravatarMichael Gray Says:

    @Reese: yes categories for things like “video cards” and “pdas” and then tags for the brands brands names like “blackberry” and “nvidia”

  5. User GravatarMichael Martine aka Remarkablogger Says:

    Most cogent explanation of this I’ve ever read. Thanks, Michael. Sphunn and stumbled. :)

    Blogs I’ve been working on lately have had this Permalink structure: domain/categoryname/postname.html and I’ve set the All-in-one SEO pack plugin to allow crawling of categories but not the other archives. Categories become important keywords in this case.

  6. User Gravatar» Yet more duplicate content fuzz | Pagespank.com - web 0.0 Says:

    [...] Read this post from Google on duplicate content, then read (and pay more attention to) Graywolf’s post. [...]

  7. User Gravatarsteaprok Says:

    Thanks for the post. It really is a very informative and concise explanation. sphunn and stumbled as well.
    Glad to see you posting with more frequency, I always appreciate your POV

  8. User GravatarMarty Martin Says:

    @graywolf: Awesome post. Well thought out and concisely written. I appreciate it because I think so many people out there (myself included in the past) haven’t completely understood the dup content penalties as such. Your post includes a lot of common sense usability, but hey, sometimes we need someone to help us all along with common sense ;)

    Cheers!

  9. User GravatarLorraine Ball Says:

    This is a great strategy for small business owners trying to incorporate blogging into their marketing mix. It makes it much easier to find a few categories and post, post, post, rather then splitting hairs between topics such as marketing, marketing research, marketing branding, marketing promotion, etc, etc, etc.

    While I won’t be going back and redoing the hundreds of posts I have already written I will be taking a more streamlined approach to future posts
    Thanks!

  10. User GravatarTrophaeum Says:

    I have to disagree with using robots.txt to block those duplicate pages in the other navigation methods, meta robots noindex,follow them, they WILL get links dropped on them from time to time and if you robots them out your losing some of the juice that they get handed, if you noindex,follow them then it should flow further into the site than what it would otherwise. Just my 2c.

  11. User GravatarMichael Gray Says:

    @Trophaeum: if you block them google wont index them and not send any search traffic to the pages. Sure the odd person here and there may link to it, but it’s better to pick a solution that solves most of your problems than one that leaves most of them unsolved.

  12. User GravatarGeorge Says:

    Great points. I run into the temptation to copy and paste in a lot of what I do. Taking the patience to sit down and create new content, no matter what it relates to, is the difference between search engine marketing and spam IMO.

  13. User GravatarEric Itzkowitz Says:

    Michael, ‘Disallow: /terms-of-service/’ is listed twice in your robots.txt file.

  14. User GravatarMichael Gray Says:

    @Eric Itzkowitz: must have been put there by the department of the redundancy department :-)

  15. User GravatarEric Itzkowitz Says:

    @Michael I wish the department of the redundancy department was working the Broncos/Chargers game. ):

  16. User GravatarJohn H. Gohde Says:

    Google doesn’t like to give clear instructions, for a reason. Thank goodness that you did.

  17. User GravatarAdam Lefever Says:

    Another dupe-content-related tip is to be sure your URLs are canonized. For instance, having the same page load for page/index.asp and /page/ can be considered duplicate content, but more importantly you’re making crawling more difficult if they’ve found both (usually depending on who and how they’re being linked to, and depending on how any link-ees have navigated to your site.) In my experience, fixing this issue has caused search engine rankings to boost within a week or two.

  18. User Gravatartmongan Says:

    What if you are putting together a page for your website that focuses in on local attractions, restaurants, campgrounds, ect. to try to get local listing results. so for the the list you put
    XXXXname of restaurant(linked of course!)
    xxxxxshort blurb on restaurant copied from there website

    would the copied content be required duplicate content?

  19. User GravatarMichael Gray Says:

    @tmongan: yes if all of the snippets exist elsewhere you have a problem. if you have a page of snippets that are else where you have a problem.

    Unless of course you are an old crusty trusted authoritative domain like yellowpages.com because then the duplicate content rules dont matter as much.

  20. User GravatarDuplicate Content -What is Duplicate Content and How to Fix it Says:

    [...] up to the Search Engine to determine which the main source is out of all the options available. As Michael Gray stated When you leave it up to Google you are hoping they guess that’s what you wanted, while [...]

  21. User Gravatarsweetievale Says:

    Nice job to this post! You help me understand about more SEO topics. Keep it up! Hope to read more samples. Thanks!

  22. User Gravatarseo company Says:

    I am so sick of people copying my clients information, Google should get tougher on these scammers!

ss_blog_claim=11d9e7755a5f0384ef8a8ef01b9a7c1f