6.May.2008 at 7:14 pm | Sunny
Who is Stealing your Mojo? RSS Feed Scraping, Splogs and How to Fight them
If your site gets any traffic at all, chances are your RSS feed is scraped and misused by some one trying to score on SERP for the work you have toiled on. This very blog is constantly ripped by sploggers who have no regards to intellectual property rights or just plain simple dumb to realize they are stealing. Here are three sites where you can find the entire content of our blog scraped/ripped/stolen and posted as their own. Of course I did not link to them because I really don’t want you to visit them:
- myblogmix.com (this m^*&*0 f$%*r even uses one of our themes on his splog!)
- frenzygraphics.com
- urlfan.com
How do you stop them?
Jonathan Bailey, a security expert (Plagiarism Today) who also writes on Blog Herald, lists 6 steps to identify and fight content theft. We religiously followed the suggestions for months now (that post on BH was from last November), but none worked thus far. Google is yet to acknowledged our letters (used his cease and desist letter templates, which is a good source, thanks!). I even took solace in thinking maybe they have a backlog or that they will eventually write back, but nothing really happened.
Why is it hard to stop them? My guess is (I am speculating here):
- These sites actually generate more revenue for Google Adsense than you do. Leveraging contextual advertisement economics is one reason why there are so many splogs.
- We are not “big enough” to leverage the outcome in our favor. (I know it sounds like one of the content theft myths )
But here’s the logic, let’s say you steal content from Yahoo or Reuters, rest assured your ass is toast and your site will get banned forever, but steal from a blog running on a shoe string budget and you know where I am going; no one cares.
How can we combat it?
With a little effort, we can combat this issue. I must warn you, this is neither a bullet-proof method nor easy to do. You need to be the admin of the blog and have access to both the template files and other server related files. You will have to be comfortable enough to do one of two things on your end.
- Edit you theme template files
- Access and edit your .htaccess file
Just follow the steps and you should be heading in the direction to securing your RSS feeds.
Identify the Culprit
The very first thing to do will be to figure out if your content is being stole and if so, who is stealing it? A simple Google search should tell if your content is stolen, look at the SERP and if you find other sites with content unique to yours then open and check them out, that just might be your content.
Also, if you use WordPress and Akismet to power and protect your blog like we do, one place to start will be in the Akismet page where trackbacks and ping-back spam is held for moderation. Follow the trackbacks to see if your content is stolen and used as is, if so, first mark them as spam and then copy the IP address (which looks like something like this: 210.48.152.20) on to a Notepad or other text editor.
One other place to look for is your feed subscribers list. We use FeedBurner, this fabulous feed service offers the Pro stats (which is free actually) package that has an option to view feed subscribers with “uncommon” usage. This list usually has all your sploggers, so that is an easy way to identify them. However, with this method, you will have to use a WhoIS (http://www.whois.sc/) service to identify the IP address. Either way, you need to identify the IP address of the splogger.
Create an IP Ban
This is where you prepare the defense system. There are many ways to go about, we will discuss three methods below.
Using Javascript to redirect the IP when culprit tries visiting your site:
- Copy and past the following Javascript into a text editor (where you have your IP addresses stored)
- <script type=”text/javascript”>
- // Block IP address script- By JavaScriptKit.com (http://www.javascriptkit.com)
- // For this and over 400+ free scripts, visit JavaScript Kit- http://www.javascriptkit.com/
- // This notice must stay intact for use.
- //Enter list of banned ips, each separated with a comma:
- var bannedips=["23.23.23.23", "11.11.11.11"]
- var ip = ‘<!-#echo var=”REMOTE_ADDR”->’
- var handleips=bannedips.join(“|”)
- handleips=new RegExp(handleips, “i”)
- if (ip.search(handleips)!=-1){
- alert(“Your IP has been banned from this site. Redirecting…”)
- window.location.replace(“http://www.google.com”)
- }
- </script>
Source: Javascript Kit
- Change the variable bannedips to include the IP you need to ban (so change the 23.23.23.23 and 11.11.11.11 to the IPs you want blocked)
- Open your header.php file or edit Header from WP Admin
- Copy and past this Javascript right below the <head> tag and above the </head> tag
- Save and close the file
What this essentially does is redirect the culprit who is trying to access your site to Google. It need not be Google, you could redirect them back to their own site.
Using deny command for specific IP in .htaccess:
This is where you create a simple allow-deny command on the server side to ban IP on access.
- Copy and paste the following code to your text editor
- order allow,deny
- deny from 23.23.23.23
- deny from 11.11.11.11
- allow from all
- Change the variable deny from to include the IP you need to ban (so change the 23.23.23.23 and 11.11.11.11 to the IPs you want to block)
- Open FTP and access your server (some of you might have to use the File Manager in you CPanel) and access the .htaccess file.
- If you do not have one, create a simple .htaccess file by simply saving a text file on your root and renaming it .htaccess (note that this file has no name, just the extension which is htaccess)
- Copy paste the allow deny code to the .htaccess file
- Save and close the file
What this essentially does is deny access to the IP listed.
Using redirect rewrite rule for specific IP in .htaccess:
The other method is to use the rewrite command.
- Copy and paste the following code to your text editor
- RewriteEngine On
- RewriteCond %{REMOTE_ADDR} ^210\.48\.152\.(.*)$
- RewriteRule .* http://www.myblogmix.com/rss.php [R,L]
- Change the RewriteCond from 210\.48\.152\. to the IP you are banning, in this example, we are redirecting all the IPs from 210.48.152, and risk banning some legit bloggers from Malaysia from access our blog
- Copy paste the allow deny code to the .htaccess file
- Save and close the file
What this does is redirect the IP back to it’s own RSS feed, creating a loop of a sort.
Note that all the methods described here work only if the splogger is using a static IP, but that is not necessarily true, hence the not bullet-proof disclaimer. One way to expand the ban is by dropping the last digits of the IP, for example, in 23.23.23.23, drop the last .23 to make it 23.23.23, this however will ban all IPs from that service provider.
Check Back and Repeat
This final step is as important as the first; that is to constantly check to see if your sites content is stolen and repeat the banning process.
So has it worked on our site?
The answer is not a simple yes, but has worked to some degree and this is obviously work in progress, it is a matter of who is faster in adopting to the change, unfortunately in our case, the sploggers are. One other method that was not discussed is the use of partial feeds, which we will save for a future post. For more information about Splogs and how to combat it, visit Plagiarism Today.
1. Robin | May 7, 2008 #
Hi, i know some splogs are stealing my contents even though i don’t have that much traffic. However, does the effect in above techniques will be the same if i simply ban the splog’s IP using plugin like wp-ban? Any help and advice will be highly appreciated.