Using Sitemaps to Combat Duplicate Content
November 21st, 2006 by Michael Gray in Google, SEOIf you're new here, you may want to subscribe to my RSS feed. Read my top posts or learn more about Michael Gray. Want more frequent updates follow me on Twitter. Thanks for visiting!
If you talked to anyone at the recent pubcon you realized one the things that’s on everyone’s mind is duplicate content, scrapers, splogs and adsense. While I didn’t get a chance to spend any time with anyone on the Google Sitemaps team, sometimes it takes a few hours on a plane for an idea to gel.
Now that the big three are supporting a universal sitemaps protocol how about kickin’ it into gear and using this tool to address a real problem and not just pay it lip service. I know lots of people have a problem with this kind of thinking, but in my experience if you want people to change or do something, providing a positive incentive is a great starting point. So when a webmaster pings the sitemaps services with a notification of new content, make that first discovery a claim of content ownership.
Sure it’s not a perfect solution but it’s definitely a step in the right direction. Should it be an absolute irrevocable claim, absolutely not, but it should carry more weight. The most obvious problem with this solution is if someone in the sitemaps programs “steals” content from someone who’s not in the program. The solution is obvious, get the original content owner involved in the sitemaps program. Now a clever search engineer might even come to the conclusion that website that was constantly pinging and trying to claim content that had been previously claimed might be an indicator of low quality.
What do you think good idea or not?
Sphere It










November 21st, 2006 at 6:29 am
On topic, the other main problem here is that a lot of people (mainly non-blogs) are creating sitemaps manually and only updating infrequently.
Your idea would work fine for those with automated pinging services, but for all others it might actually be counter-productive I think.
November 21st, 2006 at 11:27 am
This would be open season for spammers to steal and claim all the content they want. You say to get the real webmaster involved but SOOOO many people will never get involved in sitemaps for many different reasons that content theft will become more of a problem than ever.
November 21st, 2006 at 11:38 am
Instead of reducing issues regarding duplicate content, it will increase them and that too to a level where it won’t have any solution.
Instead of reducing issues regarding duplicate content, it will increase them and that too to a level where it won’t have any solution.
Instead of reducing issues regarding duplicate content, it will increase them and that too to a level where it won’t have any solution.
November 21st, 2006 at 3:26 pm
What JeremyL said. A blackhat could steal someone’s content and lay claim to it before the rightful owner bothered to claim ownership.
November 21st, 2006 at 3:42 pm
I don’t understand. If I ping you first with the content why would you assign it to someone who pings it after me? If I ping you second then it’s my fault, kind of like not filing an official copyright and expecting it to be legally defendable. Secondly if people knew getting involved with sitemaps would assure they are getting credit for their content it’s a strong motivator to use it.
November 21st, 2006 at 4:29 pm
You don’t have to file for a copyright on written material (in print, on the web or otherwise). The author automatically holds the copyright. Most of the best stuff on the web is written an uploaded by people who have never even heard of sitemaps. You can’t expect them to beat some blackhat with three laptops and a blackberry if all that is needed is a quick XML ping to the sitemaps program.
November 21st, 2006 at 4:36 pm
Correct you don’t have to file one, but think you’ll be able to defend it and you’ll be in for a very expensive legal battle without a really strong case.
I see more and more CMS’s with sitemaps integration build in.
November 21st, 2006 at 5:05 pm
it’s a wonderful partnership. It would be only solve to piracy of content. And it was…
I know, many webmaster that writes good content. But, their content is stoled and in serps their sites r lost.
I think, it’s end of content thieft!
November 21st, 2006 at 5:51 pm
I agree with Michael and Matt. doh
I believe that the first to claim it owns it. When I publish my articles with my wordpress setup I’m always sure to check the sitemap.xml file that’s submitted to Google. I dont want to get pinched for having dup content again.
November 22nd, 2006 at 1:12 pm
The “whoever smelt it, dealt it” theory sounds too ripe for abuse. Wide scale adoption (except by pro SEO’s aka spammers) is the issue here.
November 22nd, 2006 at 10:24 pm
Nice idea, but it’ll require an eficient sitemap and pinging script in the publisher’s sites or else, anyone can ping it before.
Anyway, I still think I could be a good (and fair) factor to sum with the duplicated content algorithm, it will make life easier specially to the ones who aren’t aware of this kind of trouble.
November 24th, 2006 at 5:57 am
I’m not sure that this would help. If you submit your sitemap one day and Google downloads it several days later, what’s the real date of your URL for Google? If it is the date Google downloads it, what would happen if someone’s else sitemap is downloaded before?
Thanks for this blog, Greywolf
Dict
March 26th, 2007 at 10:50 am
excellent article
yes alot of cms programs are now integratind .xml into there software
i dont like the way google downloads it several days later though
Stephen