SEO Scrapers over at cocoonworks.com
Posted on May 30th, 2006by Michael Gray in Grayhat SEO
If you're new here, you may want to subscribe to my RSS feed. Read my top posts or learn more about Michael Gray. Want more frequent updates follow me on Twitter. Thanks for visiting!
If you’re going to be a scraper news aggregator, I don’t know that I’d start with the high profile people who could make things difficult for you:

I also don’t think it’s an especially good idea pass off entire posts as having been written by someone other than the original author, and NOT attributing the content to the original owner http://seo.cocoonworks.com/?cat=6 (no link love for you).
Sphere: Related Content










May 30th, 2006 at 11:29 am
An issue for two reasons:
1. Copyrights
2. Ethics
I’m with you on this one Graywolf. Sometimes people need to avoid the temptation to “aggregate” for the sake of trying to build quick content for purposes no other than to build traffic.
May 30th, 2006 at 11:36 am
I have wondered about this for some time now but it appears that as long as there is a “link back” Google knows the originator.
Hitslog (an aggregator) uses all my content on seobuzzbox.com, everything they aggregate appears in the top of Google’s serps. I was pissed off until Matt Cutts hinted that it is a kind of backlink and helps new sites. I also use feedburner to show partials.
So, if Michael enables partial feeds/then Matt and Adam v.2.0 will have to visit his blog the old fashioned way…hehehe
May 30th, 2006 at 1:26 pm
A great way to find people copying your posts is http://www.copyscape.com
May 30th, 2006 at 1:46 pm
Yep copyscape does work great. However I’m thinking about programming my own wordpress plugin to harness the power of RSS to check for me automatically every time I update my feeds
May 30th, 2006 at 2:30 pm
That would be cool but the esiest way is to just cut and paste an entire sentence into google from a previous post.
http://www.google.com/search?q=All+right+I%E2%80%99ve+looked+around+can%E2%80%99t+seem+to+find+the+answer%2C+so+hopefully+one+of+my+nice+readers+knows+the+answer+and+will+share.&start=0&ie=utf-8&oe=utf-8&client=firefox-a&rls=org.mozilla:en-US:official
A report tool would be a good addition to a plugin for the real scrapers who do not link. That is stealing and all it takes is for googlebot to hit their site first to make your content “their” freakin’ content.
May 30th, 2006 at 2:57 pm
it may be a few weeks but trust me it will be much better and way easier than that, I’ll be bringing button pushing automation to the white hats.
May 31st, 2006 at 7:56 am
I don’t think he likes linking to you either. Below is from that site, without live links.
“All right I’ve looked around can’t seem to find the answer, so hopefully one of my nice readers knows the answer and will share. I’m getting a bunch of inbound links that look like this
http://www.wolf-howl.com/%3Fp=324
http://www.wolf-howl.com/%3Fp=325
http://www.wolf-howl.com/%3Fp=326
%3F is the URLencoded character for a [?]. I could do something manual but I’m looking for a slicker more programatic way to rewrite the %3f character to a ? and keep the remaining query string intact.”
May 31st, 2006 at 8:38 am
Let’s just say I’ve given him a few days to mend his ways before I start to rain hellfire down on his world in biiiiggg way.
May 31st, 2006 at 10:51 am
Hehe don’t know how much they’d be able to aggregate off Earl these days
June 1st, 2006 at 9:25 am
There’s a very specialized aggregation site that I’ve wanted to power with WordPress. But the good plugins that I’m aware of only publish the full post so I’ve set it aside until I can limit the number of words excerpted.
June 2nd, 2006 at 10:26 pm
It would also help if he could COUNT. He named six people, not five. (God, I’m a raving bitch)
June 4th, 2006 at 9:49 pm
Hi there.
I’m the guy that can’t count.
There happens to be a lot of debate on the copyright law regarding splogs, scrapers, RSS aggregators, whatever you want to call them, so assuming the SEO community would learn of my little project fast, I figured it would be a good topic to test the law on. Surely, Gray Wolf was the one to step forward in 8 weeks online, and I have gladly removed all of his posts.
However, I’ve got to say, I believe RSS is a great advertising medium for bloggers, and by offering RSS feeds on your blog, you are, in my opinion, offering an implied license for others to scrape away. REMOVE YOUR RSS FEED DUDE!
A good article on the topic here:
http://blogs.law.harvard.edu/palfrey/2006/01/17#a1039
Thanks for helping me learn how to count.
I think I’ll now go create a “Copyright Law Splog” and see if i can get it into the Supreme Court, to test this law.
I mean.. come on, how is my splog much different than what this company is doing?!
http://www.toptensources.com/
Cheers,
NORTH
h**p://seo.cocoonworks.com
June 4th, 2006 at 9:55 pm
>>I also don’t think it’s an especially
>>good idea pass off entire posts as having
>>been written by someone other than the
>>original author, and NOT attributing the
>>content to the original owner
>>http://seo.cocoonworks.com/?cat=6 (no link
>>love for you).
A mute point, but every article I’ve scraped links to the source blog, and states the original author… which is why I think you made a mistake by asking me to remove your posts… well… if my site was popular, you’d be making a mistake as you’d deny all my traffic. (link love back please?)
On that note, I wonder.. if my site was generating you a ton of traffic, would you still ask me to remove your articles?
PS: Thanks for the write up, this is a good discussion.
Cheers,
NORTH
h**p://seo.cocoonworks.com
June 12th, 2006 at 2:28 am
CORRECTION: A “moot” point.
I can’t spell either.