Recently, I wrote that billionaire Sam Zell's inflammatory remarks about Google stealing the content of online newspapers seemed unfair and inaccurate:
“If all of the newspapers in America did not allow Google to steal their content for nothing, what would Google do?” he asked. “We have a situation today where effectively the content is being paid for by the newspapers and stolen by Google, etcetera. That can last for a short time, but it can’t last forever. I think Google and the boys understand that."
- Sam Zell, Stanford Business Daily
Google isn't stealing this material. The newspapers have left the content wide open - and simply haven't asked Google not to use it. Google News appropriates Fair Use materials (captions and image thumbnails) and subsequently drives significant traffic through to online news sites that might not otherwise receive page views or revenues from these readers (at this time, Google News does not even earn revenue from this activity). A reporter from the Seattle Post-Intelligencer recently said that 10 percent of their sites traffic comes from Google News. Online news aggregators like Google News and NewsCloud are good for the online newspaper business.
Yet, I feel bad for Mr. Zell, having spent $8 billion dollars on the Los Angeles Times and Chicago Tribune and barely having any technical knowledge of how the Internets work. To help him out, I've written an easy how to guide for stopping "theft" of your online newspaper content ... but it might as well be called "How to relegate your online newspaper to obscurity and minimize your subscriber base" or "Minimizing the bandwidth usage of your online newspapers" or "My Secrets of Search engine de-optimization". Mr. Zell needs to understand that the way these sites are operating today is essentially like leaving your garage door open every day with a sign that says "Community Tool Lending Library". You're just asking for someone to use your stuff.
Here are my eight easy steps to stop the Google Boys from driving traffic to your business:
Technorati Tags: copyright, fair use, google, google news, online newspapers, sam zell
1. Tell Google not to index your online newspaper
Google won't crawl your site and drive you all that bandwidth-eating, revenue generating traffic if you don't want them to. "Google News obeys standard web protocols for robots.txt files and robots meta tags. For more detailed information about creating robots.txt files and robots meta tags, please visit http://www.robotstxt.org/wc/exclusion.html"
Simply put a file called robots.txt in your root directory with the following content to block Google:
User-agent: Googlebot
Disallow: /
or block all search engines:
User-agent: *
Disallow: /
or put this inside the <HEAD></HEAD> section of any HTML page you want to exclude:
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
2. Don't allow anyone (especially those pesky buzz-building bloggers) to link to your content
Use Apache's mod_rewrite and .htaccess to redirect all incoming traffic back to your home page. Force new visitors to search for the content they want on your site - that'll be a good experience for them. Put this code in your .htaccess file:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$ [NC]
RewriteCond %{HTTP_REFERER} !^http://latimes.com [NC]
RewriteCond %{HTTP_REFERER} !^http://www.latimes.com [NC]
RewriteRule ^.*$ http://www.latimes.com/ [R,L]
3. Don't allow people to hot link your images
The above solution works great but this solution will put an unfriendly error graphic of your choice on the offending site:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(.+\.)?mysite\.com/ [NC]
RewriteCond %{HTTP_REFERER} !^$
RewriteRule .*\.(jpe?g|gif|bmp|png)$ /images/nohotlink.jpe [L]
A List Apart has some suggestions as well, but that's only for people who care about their reader's experience. Those ALA guys are a bit lefty if you ask me.
4. Don't let anyone frame your content
Minimize the revenue outside aggregators and news readers generate for you. Put annoying anti-framing javascript in place. Anytime someone tries to frame a page on your site, it'll jolt them out of that convenient browsing experience:
<script language="JavaScript" type="text/javascript">
<!--
function framebreakout()
{
// Generated by thesitewizard Frame Breakout JavaScript Wizard 2.0
// Visit http://www.thesitewizard.com/ to get your own
// frame breakout script FREE!
if (top.location != location) {
top.location.href = document.location.href ;
}
}
//-->
</script>
<body onload="framebreakout()">
5. Turn off your RSS feeds
If you don't want people stealing your titles and captions, shouldn't you stop publishing them in a ready-made machine-readable format? Besides, the only reason you have RSS feeds is because your geek engineers thought it would be cool and help build traffic...but really, they just have too much time on their hands. Maybe you should fire them and outsource future work overseas.
If you have fired them already, just delete any lines that look like this from your Web pages:
<link rel="alternate" type="application/rss+xml" title="Los Angeles Times Welcomes Feed Readers" href="http://www.latimes.com/rss.xml" />
6. Force people to pay for your content
The Wall Street Journal requires readers to pay for access. The New York Times requires you to pay to read its Op-Ed contributors such as Maureen Dowd, Paul Krugman, Thomas Friedman. What? Haven't heard of them? That's probably because the online readership of their content has dropped significantly since The New York Times put them behind a subscription firewall:
But that's probably for the best, you don't want your contributors to have influence or impact. That might just increase your bandwidth costs. To implement a subscription firewall, look into a free open source user authentication solution like Rampart/UMA.
7. Don't let people copy text from your Web site
Pesky bloggers are notorious for copying excerpts of your articles and pasting them all over your site. They often evade the law by only copying small amounts of text (they call it fair use). Well, let's close that loophole. The following script on your Web pages will prevent them from selecting and copying text:
<!-- Paste this code into an external JavaScript file named: disableSelect.js -->
/* This script and many more are available free online at
The JavaScript Source :: http://javascript.internet.com
Created by: James Nisbet (morBandit) :: http://www.bandit.co.nz/ */
window.onload = function() {
document.onselectstart = function() {return false;} // ie
document.onmousedown = function() {return false;} // mozilla
}
/* You can attach the events to any element. In the following example
I'll disable selecting text in an element with the id 'content'. */
window.onload = function() {
var element = document.getElementById('content');
element.onselectstart = function () { return false; } // ie
element.onmousedown = function () { return false; } // mozilla
}
<!-- Paste this code into the HEAD section of your HTML document.
You may need to change the path of the file. -->
<script type="text/javascript" src="disableSelect.js"></script>
8. Don't let anyone print articles from your Web site
Another risk of the Internets is that people often print articles and share them with your friends. This might only lead people to talk about your newspaper and again, increase future bandwidth costs. Put this code in your style sheet and your readers will print only blank pages. Ha, the jokes on them:
/* disable print */
@media print {body {display:none;}}
9. Turn off those APIs and Web Services
APIs and Web Services just make it easy for people to steal your content. Just turn them off ... and fire anyone who even mentions the words API.
In closing, preventing "theft" is up to you. Open source developers (and others) have created the technology to protect you. But in this personal responsibility world, it's up to you Sam. Go ahead, indulge your luddite ambitions.

