One of the things that a lot of people don’t get about the new Google is the over-reliance on trust in the algorithm. It’s this over-reliance on trust that puts wikipedia at the top of so many searches. However if you have this trust it’s like having a laminated get out of jail free card for what you can get away with in google. Don’t believe … ok here’s a nice example.
Back in October Christopher Elliott wrote a nice post entitled Dump this! 7 things airlines should jettison from their planes now. It was a good piece and it made the rounds on the popular social sites. Mr Elliot is also a well know travel writer and has existing syndication deals with websites like MSNBC and Frommers. Now Google thinks they have duplicate content like this licked they said so much on their official blog
Generally, we can differentiate between two major scenarios for issues related to duplicate content:
* Within-your-domain-duplicate-content, i.e. identical content which (often unintentionally) appears in more than one place on your site
* Cross-domain-duplicate-content, i.e. identical content of your site which appears (again, often unintentionally) on different external sites
and
To conclude, I’d like to point out that in the majority of cases, having duplicate content does not have negative effects on your site’s presence in the Google index. It simply gets filtered out.
Well Google I’m calling BS on your ability to filter out duplicate content, and I’m prepared to prove it. Lets take a look at the SERP for the title of Mr Elliot’s Post [7 things airlines should jettison from their planes now] screen shot below (yes google it’s going to get worse much worse)

So what do we have, 7 of the 10 pages, are in the words of My Cousin Vinny Prosecuting Attorney Mr Trotter (wait for it) … eye-dentical, don’t believe me check them out.
- http://www.frommers.com/articles/5569.html
- http://www.elliott.org/the-travel-critic/dump-this-7-things-airlines-should-jettison-from-their-planes-now/
- http://www.tripso.com/columns/dump-this-7-things-airlines-should-jettison-from-their-planes-now/
- http://www.msnbc.msn.com/id/26794290/
- http://www.newsvine.com/_news/2008/09/22/1895972-dump-this-7-things-airlines-should-jettison
- http://www.iconocast.com/B000000000000079/K6/News2.htm
- http://www.capecodonline.com/apps/pbcs.dll/article?AID=/20081109/LIFE06/81106018/-1/LIFE
Way to go Google I’d say that was a failure of epic proportions as far as filtering out duplicate content is concerned.
Now gadling at least went to the trouble of rewriting the post, but not to worry the Digg submission which contains another eye-dentical (heh) copy of the first paragraph got listed twice. Lastly the reddit post has no content or comments but has title that is eye-dentical to the titles of the other posts.
If the people people who are responsible for programming the duplicate content part of the algo worked for me I’d make them wear these shirts till they got it fixed.

Perhaps I’m being a little harsh, maybe it’s not the people who worked on the duplicate content part of the algo who should be swimming with the fail whale, but maybe it’s the folks who built in the over dependence on trust into the algo.
If that’s the case then Google problem stems directly from the top since Eric Schmidt was quoted as saying:
Internet is a “cesspool,” a festering sea of bad information, said Google’s CEO, Eric Schmidt, yesterday while speaking to a group of visiting magazine executives at the company’s Mountain View, California Campus during the American Magazine Conference. Schmidt suggested that “brands” are more important than ever and key solution for this problem is brands. “Brands are the solution, not the problem”
Sorry Mr Schmidt brand’s aren’t the solution, they are just as much of a cesspool as the rest of the web, they just happen to have bigger PPC budgets that line your pockets.
Oh and for the pesky google search engineers at their desk muttering “but but this is an isolated incident, caused by a long multi word long tail search, this isn’t a systemic problem“. As soon as you finish your trip down the river called de-nial check out [Palin pipeline terms curbed bids] and [Arab diva's necklace sold] and get back to me, maybe if you’re lucky you guys can get a bulk discount on those shirts if you order now.
Related posts:
- Why Big Brand Media Sites Are The Real Cesspool of the Internet Eric Schmidt of Google famously remarked last year that “The...
- Re Inclusion Requests Only For Verified Sites I’ve submitted more than one reinclusion request over the years, ...
- Why You Should Be Worried About Google Search Wiki I’ve been beating the drum about why personaized search and...












{ 3 trackbacks }
{ 17 comments }
that was an interesting thing man.. so google isn’t that sure who wrote it “First”? ha.. great!
I don’t think there is any filter that will send non-original articles into supplemental directly.
It seems to me that when Google finds multiple versions, it would (almost always) give weight to various seo factors to decide which result to show on top, regardless of whether that is the original or syndicated version.
I found this to be true even when the syndicated version had a link back to the original site
Michael it could be as simple as they dont care what the SERPs look like for a query like “7 things airlines should jettison from their planes now”. Meaning that showing duplicate stories this this low, low, lowly competitive phrase is not a big deal and might even be ideal in their eyes.
Michael, if I were Mr.Schmidt, you would be working for Google long ago for at least two reasons:
1. To shut your mouth;
2. To implement your ideas into the wild, i.e. make Google searh engine SEO-wise.
Cheers for the nice post!
Duplicate content is always on my mind. The man that is the manager of how Google ranks posts says that duplicate content does not actually hurt your blog. However, if you don’t have duplicate content and then you will be ranked better. So in this sense it does actually hurt you.
I loved the eye-dentical stuff
. Happy memories from SMX Advanced!
I think it’s fair to assume that if I search the phrase [7 things airlines should jettison from their planes now] then I am specifically looking for that article – and the subsequent conversation surrounding it. So in that respect it makes a lot of sense for Google to return a number of different sites on which the article appears, each (possibly) offering different comments and commentaries. Still, I agree that it would have been good to have Elliott.org return in the No1 slot.
Excellent. I see this all the time. Wasn’t there talk about an “authority” at some point, where the article can say it is the first, and hence be but in the top on the serps?
I have heard so many different opinions on duplicate content I just don’t know what to believe anymore. I will avoid duplicating content regardless where it is posted at all costs just to be safe.
On long tail keyword searches this is common and the correct result. Owning an article site or two, I see it all the time when checking article submissions for duplicate content – someone will submit an article and I’ll see it is already on 15 different sites (well my automated dupe content checker does). I like this information because it saves me Copyscape fees. While exact articles will come up for a 5 – 15 word search, only a few will show up for the money 2 or 3 word phrases and those are normally sites that got the article up and indexed first.
Interesting pots.According to me ,google allows the site which satisfies all SEO factors leads in first.By this sure the site having duplicate content will lack down.thanks.
This is just an unfair attitude of google towards small sites. No wonder I got penalized easily last month with just one duplicate post in forum..
Google’s inability to determine the original source of a document is a huge problem. At least in this case, the original source turns up, though because you were searching for the actual title, this probably explains why it turns up at all. If you had been searching for anything useful for which this article should rank, my guess is that the duplicates would show up and the original article would be lost in oblivion. What is even worse about Google, is that if enough of your content is stolen, it will toss out your entire site or entire sections of your site. Things that ranked well will suddenly disappear for no reason, even if they themselves had not been stolen. It’s pathetic, really. It’s bad enough that Google can’t determine the original source, but then it uses its faulty logic to make judgments about entire sites. It’s like going to the cops after you’ve been mugged and then having the cops take everything else the mugger might have missed. Do no evil. Ha.
I do not believe that Google knows original content.I saw that my some websites’ original content pages are seemed under the other web sites’ pages completely copied from mine on google search results..What a pity,people assumes that Google knows original content but no…
My main worry with Google is if they personalize search results. I have fought like a dog to get to the top ranks under organic search results, and I don’t want to lose those standings. As to duplicate content, I had two different websites with similar content on both, and when I got rid of one of them, the other one instantly showed better Google rankings. Duplicate content does matter to Google.
Hi Michael, I tend to agree with Jaan, because this in many cases is not 100% dupe content, also taking a look at these sites – they are active sites and they have huge amount of unique, in many cases user generated content, like 200-300 words of comments, other recent articles etc.
The duplicate content issue however do exists when you want to copy 1000 articles on acne, fitness or web development and build website solely out of them. Maybe adding some simple banners / affiliate offers and get a handful of links.
This is a constant problem. Even in partially competitive areas. I have a client that has about 8 domain names, none resolve to the marketed domain name (creating 8 exactly the same websites) and regularly they have 4-7 listings in the top 10. That’s for 3-4 word phrases. That tells me there is a problem. Heck, if I can see that they are all the same IP, why can’t our “friend” the Googlebot?
But hey, I’ve learned to work with the system
not against it. When they fix the issue I’ll consolidate, until then… why bother? I believe Google are as responsible for the “cesspool” that has been created. When they clean up their act the rest of us will be forced to follow.
Comments on this entry are closed.