If you have been blogging for a while, chances are you are familiar with content scrapers. Content scrapers are websites that steal your content for their own blogs without your permission. Some content scrapers will just copy the content off of your blog, but most use automated software that takes the content from your RSS feed and posts your content to their site like it is a new post.
In this post, we are going to look at some potential link building benefits to content scrapers, how to find out what sites are scraping your content, and what you can do if you want to either benefit from the linking standpoint or have them take it down.
Linking Benefits of Content Scrapers
Last week, I was happy to see that I was listed in ProBlogger’s 20 Bloggers to Watch in 2012. Within 24 hours, I received a notification in my WordPress dashboard that a page on my blog had been linked to in the post on ProBlogger’s site.
After receiving the original notification from the ProBlogger post, I also received another 18 trackbacks from sites that had stolen the content in their post verbatim. Trackbacks are WordPress’ way of letting you know that another website has linked to a post on your blog. In this case, these 18 sites had posted the content exactly like the original post – with the links back to my blog still intact.
It was then that I started contemplating the potential link building benefits of content scrapers. These are not by any means quality links – the highest Google PageRank was a PR 2 domain, many were stealing content in a variety of languages, and one even had the nerve to use some kind of redirection script to take away the link juice of outgoing links! So while these links didn’t have the same authority that the original post had, they still count as links.
How to Catch Content Scrapers
Unfortunately, unless you want to continuously search for your post titles in Google, you’ll only be able to easily track down sites that keep your in-content links active. If you want to know what websites are scraping your content, here are a few tips to sniff them out.
Copyscape is a simple search engine that allows you to enter the URL of your content to find out if there are duplicates of it on the Internet. You can get a few results using their free search, or you can pay for a premium account to check up to 10,000 pages on your site and more.
The first way is through your trackbacks in WordPress (as shown in the image above). Many of these will show up in the spam folder if you use Akismet. The key to getting trackbacks to appear from content scrapers is to always include links to other posts in your content. Be sure those links have great anchor text too, if you’re going for a little extra link juice. And even if you are not, internal linking with strong anchor text is good for your on-site optimization too!
The next way to catch them is in Webmaster Tools. Simply go to your site in Webmaster Tools, and look under Your Site on the Web > Links to Your Site. Then sort by the Linked Pages column.
Anyone thinking about link building benefits at this point is probably noting the sheer volume of links from these sites, some of which are content scrapers. Essentially any site that is linking to a lot of your posts that isn’t a social network, social bookmarking site, or a die-hard fan who just loves linking to you is potentially a content scraper. You’ll have to go to their website to be sure. To find your links on their site, click on one of the domains to see the details of what pages on your site they are linking to specifically.
Then, click on one of your links to see which pages on their site is linking to yours.
You can see here that they are just blatantly copying my posts titles. When I visited one of the links, sure enough, they are copying my entire posts in their full glory onto their site.
If you don’t post often or want to keep up with any mentions of your top blog posts on other websites, you can create a Google Alert using the exact match for your post’s title by putting the title in quotation marks.
I deliver all of my Google Alerts to an RSS feed so I can manage them in Google Reader, but you can also have them delivered regularly by email. You’ll even get an instant preview of the types of results you will get.
How to Get Credit for Scraped Posts
If you use WordPress, then you definitely want to try out the RSS footer plugin. This plugin allows you to place a custom piece of text at the top or bottom of your RSS feed content.
The result is this simple line on my blog posts when viewed through a RSS feed.
As you can see, even if you aren’t using it for the purpose of getting credit back to your posts when content thieves steal it, you can still use it for a little extra bit of advertising with the possible benefit of people who subscribe to your RSS feed clicking through to your website or social profiles. And when someone does scrape your content from your RSS feed, it shows up there too.
So in the event that someone finds your scraped content, they will hopefully notice the credit before assuming it was created by the blog that stole it. If you don’t have WordPress, you can simply include a note at the top or bottom of your content that includes the same information.
How to Stop Content Scrapers
If you’re not interested in anyone copying your content, then you have a few options to choose from. You can start by contacting the site that is stealing your content and sending them a notice that you want all of your content removed immediately. You can do this through the site’s contact form, email address, or post it to any social accounts they list.
If there is no contact information on the website stealing your content, you can do a Whois Lookup to (hopefully) find out who owns the domain.
If it is not privately registered, you should find an administrative contact’s email address. If not, you should at least see the domain registrar which, in this case, is GoDaddy and/or the hosting company for the website which, in this case, is HostGator. You can try to contact both companies (HostGator has a DMCA formand GoDaddy has an email) and let them know that the domain in question is stealing copyrighted content in hopes that the website will be suspended or removed.
You can also visit the DMCA and use their takedown services to remove anyone who is copying your photos, video, audio, blog, or other content. They even offer a WordPress plugin to incorporate a DMCA protected badge on your site to warn potential thieves.
Have you ever dealt with content scrapers and thieves? Do you leave it alone for the link benefits, or do you fight back? What other tools, services, or other preventative tactics do you use to block content scrapers? Please share your thoughts and experiences in the comments!
Tags: content scraper