Subscribe Via Web Feed Subscribe with Google Add to My Yahoo! Subscribe with Bloglines Add to netvibes Subscribe with Live.com

« Microsoft Buys EU Mobile Ad Firm ScreenTonic | Main | Now Starring: The Algorithm - Ask.com To Focus On Ranking System In New TV Ads »

May. 3, 2007 at 8:24am Eastern by Danny Sullivan

Belgian Papers Back In Google; Begin Using Standards For Blocking

Belgian newspapers that sued Google to be removed from its index are now back in, having agreed to use the commonly-accepted blocking standards that they initially rejected as not being legal. Google and the group representing the papers, Copiepresse, have issued a joint statement. That's below, along with a look at how this is a victory for Google, which has had to settle a series of similar lawsuits through agreements.

Let's start with the joint statement:

Internet users interested in Belgian news and users of Google’s search engine may have noticed today that the websites of the Belgian French and German-language daily newspapers, members of Copiepresse, are again referenced on the search engine.

This decision was made jointly by Google Inc. and Copiepresse as part of the constructive dialogue that has resumed between the two organizations.

The websites of the Belgian French and German-language daily press will now appear without a “cached” link in the search results of Google’s search engine, thanks to their use of the “noarchive”* tag.

The Belgian French and German-language daily press publishers and Google Inc. intend to use a quiet period in the court dispute to continue their efforts to identify tangible ways to collaborate in the long term.

* The « noarchive » tag is a tool for publishers to control how their website is referenced on the web.

Now let's go back to the history of the dispute, then analyze today's move.

  • March 2006, Suit Filed: Copiepresse started legal action against Google, arguing that Google's use of the widely accepted and respected robots.txt or meta robots standards as a way of opting-out of indexing was somehow trying to impose copyright rules of Google's own making on content owners. This is regardless of the fact that robots.txt existed before Google did. It is not a standard of Google's making; it's one many search engines support.
     
  • September 2006, Court Ruling, Papers Removed: The case went to court in Belgium on August 29, without Google being represented The company had been sent a summons but failed to act upon it. A ruling on September 5 required that Google remove the sites and post a notice on its web site.
     
  • November 2006, Court Case Heard Again: In November, Google's case was reheard (NOT an appeal but a rehearing of the original case, since Google didn't defend itself the first time). At the same time, Google reached a content agreement with Sofam and Scam, two Belgian groups that cover photographic and audio/visual content. Those groups joined the Copiepresse case in October. The agreement got them to drop out of the case.
     
  • February 2006, Court Ruling, Google: The Belgium court found that Google did violate copyright when including material from several Belgian newspapers in its search index. However, the initial fine it was charged was reduced, and it remains in appeal over how much those will ultimately be. Both sides have different stories on the ruling. Google's spin was that it was narrow to the particular publishers and that it could continue to operate as normal, on an opt-out basis, with others. The publishers argued that potentially, the ruling could be used by other Belgian courts as guidance.

For a more detailed look at the case over time, see my Google Loses In Belgium Newspaper Case article.

One of the most contentious issues in the case was over Google's cached copies, where Google makes available a copy of a page to people on the Google web site itself. Several of the publications were understandably upset that people could use the cache option to find content that after initially being published was removed and only offers to paid or registered members.

It makes for a powerful argument, that Google is reprinting without permission (as are all the other major search engines that provide cached copies). It is one reason I wish Google would take the lead and stop caching pages (my Search Engines, Permissions & Moving Forward In Copyright Battles article goes into depth about issues here).

However, the situation is easily avoided. Publishers simply need to use a noarchive tag on pages they don't want listed. Indeed, as I and others have long argued, the entire Belgian newspaper case was unnecessary if the real goal was to stay out of Google or keep cached pages out of Google. Existing standards give publishers automatic means to do this.

Now the Belgian papers are doing that, exactly what could have been done at the start. You can see this in action at Le Soir. The home page has this on it:

<meta name="robots" content="index, follow, noarchive" />

I've bolded the key part, the noarchive command. That tells Google not to cache the page (the index and follow parts, by the way, are entirely unnecessary. See my Meta Robots Tag 101: Blocking Spiders, Cached Pages & More article for much more guidance about how the tag works).

If you do this search, site:lesoir.be, you can also see how pages from Le Soir are listed in Google but without having any cached links present (if you don't know what those are, see my Squeezing The Search Loaf: Finding Search Engine Freshness & Crawl Dates for a detailed explanation).

The change means that the Belgian papers will now again begin receiving traffic from Google, something which they lost after suing to get out of Google News. That lawsuit resulted in them being taken not just out of Google News but Google entirely. The traffic drop had to have been painful. A new report from Hitwise shows that at least for the US, newspapers get 25 percent of their traffic from search engines.

The pain might have been worth it if the group was able to force Google into a licensing agreement, which many publishers seem to believe bestows some publishers with riches from Google. The reality is no one knows what is in these agreements. The remain relatively few, and they've primarily seemed to be a way for Google to work with parties threatening them or actually suing them to find some face-saving solution for both sides. My AFP & Google Settle Over Google News Copyright Case article covers the most recent one and recaps others.

Today's move does NOT involve an agreement at all. I asked Google specifically about this. As the statement notes, both sides remain talking about the pending appeal. I suspect we'll see Google ultimately find a way to work more closely with the papers, perhaps even an agreement for extended use of content beyond what Google would consider fair use. But it's notable that Google didn't have to follow the agreement pattern we've seen to date.

I think a key reason is that Google was able to demonstrate the power of its traffic. With the AFP and AP agreements, it's incredibly difficult for Google to remove AFP and AP content from its search engine, since hundreds of member publications post this material. With the Belgian papers, the number of them were fewer. In addition, Google was to some degree given a gift by the court ruling. It could yank out the publications, let them discover how their traffic dropped yet not seem vindictive since Google was, after all, only complying with a court ruling.

More recently, the editor of Daily Telegraph recently has tried to play the "Google's ripping up off card." He said at an industry summit recently:

"Our ability to protect content is under consistent attack from those such as Google and Yahoo who wish to access it for free. These companies are seeking to build a business model on the back of our own investment without recognition. All media companies need to be on guard for this. Success in the digital age, as we have seen in our own company, is going to require massive investment... [this needs] effective legal protection for our content, in such a way that allows us to invest for the future."

The reality remains that if the Telegraph doesn't want to be in search engines, it has existing ways to stay out and keep out right now. If it tries to go the legal route, perhaps it will get an agreement, if it thinks its important enough. But it might also find that Google could decide there are other sources of content and cut-off the traffic flow to the paper.

I have to add I find the Telegraph statement ironic given that I know they've had SEO work done in the past, work done to try and get traffic from search engines for free. Rarely (if ever) do you hear any of the newspapers complaining about Google also suggesting that they themselves should pay for inclusion.

Overall, I cannot help but find the move a victory for Google. I'd still like to see caching move to an opt-out system, as I've said, which I think would make search engines overall seem less like content leaches from some. And I want to continue to see further indexing controls be handed to publishers, such as the partial page blocking via robots-nocontent that Yahoo rolled out yesterday. These types of options are essential to ensure that search engines see publishers as partners in the indexing process, rather than subjects.

Like The Story? Vote For It On Yahoo Buzz!
Subscribe To Our Daily Search News Recap!
Your Email:
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds)
Subscribe To Our Search Feed!
Subscribe Via Web FeedSubscribe with GoogleAdd to My Yahoo!Subscribe with BloglinesAdd to netvibes
Subscribe with Live.comSubscribe in NewsGator OnlineSubscribe in RojoAdd to My AOL
Share & Bookmark This Story!
By Danny Sullivan Permalink Jump To Comments See Related Stories In: Google: Business Issues, Google: Legal, Google: News, Legal: Copyright, Legal: Crawling & Indexing, SEO: Blocking Spiders



Reader Comments

"Internet users interested in Belgian news...have noticed today that the websites...are again referenced on the search engine."

And I'm sure both those Belgian-news-loving Internet users are thrilled.

Comment by Winooski [TypeKey Profile Page] | May 3, 2007 1:18 PM

Search:

Search Marketing Expo

Save the date for:
SMX Madrid (in Spanish, May 20-21)
SMX Advanced - Seattle, WA (June 3-4) Register today! Early bird rate expires May 9!
SMX Local & Mobile - San Francisco, CA (July 24-25) (July 24-25) Pre-agenda rate expires May 2. Get the lowest rate by registering now.
SMX East - NYC - (Oct. 6-8)
SMX London - November 4 & 5, 2008

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts:

Most Recent News Posts

About Search Engine Land

Stay Updated!

Get Our Search Newsletters:
Email:
Daily Monthly

Get Our Search Feed:
Subscribe Via Web FeedSubscribe with Google
Add to My Yahoo!Subscribe with Bloglines
Add to netvibesSubscribe with Live.com
Subscribe in NewsGator OnlineSubscribe in Rojo
Add to My AOL
More About Our Feeds & Newsletters

Add to Technorati Favorites

Track Us Socially:
Facebook: Our Search News App
Facebook: Search Engine Land Page
Facebook: Search Engine Land Group
Flickr: Search Engine Land
LinkedIn: Search Engine Land Group
Twitter: Search Engine Land Feed

Bragroll