Belgian newspapers that sued Google to be removed from its index are now back in, having agreed to use the commonly-accepted blocking standards that they initially rejected as not being legal. Google and the group representing the papers, Copiepresse, have issued a joint statement. That’s below, along with a look at how this is a victory for Google, which has had to settle a series of similar lawsuits through agreements.
Let’s start with the joint statement:
Internet users interested in Belgian news and users of Google’s search engine may have noticed today that the websites of the Belgian French and German-language daily newspapers, members of Copiepresse, are again referenced on the search engine.
This decision was made jointly by Google Inc. and Copiepresse as part of the constructive dialogue that has resumed between the two organizations.
The websites of the Belgian French and German-language daily press will now appear without a “cached” link in the search results of Google’s search engine, thanks to their use of the “noarchive”* tag.
The Belgian French and German-language daily press publishers and Google Inc. intend to use a quiet period in the court dispute to continue their efforts to identify tangible ways to collaborate in the long term.
* The « noarchive » tag is a tool for publishers to control how their website is referenced on the web.
Now let’s go back to the history of the dispute, then analyze today’s move.
- March 2006, Suit Filed: Copiepresse started legal action against
Google, arguing that Google’s use of the widely accepted and respected
or meta robots
standards as a way of opting-out of indexing was somehow trying to impose
copyright rules of Google’s own making on content owners. This is regardless
of the fact that robots.txt existed before Google did. It is not a standard of
Google’s making; it’s one many search engines support.
- September 2006, Court Ruling, Papers Removed: The case went to
court in Belgium on August 29, without Google being represented The company
had been sent a summons but failed to act upon it. A ruling on September 5
required that Google remove the sites and post a notice on its web site.
- November 2006, Court Case Heard Again: In November, Google’s case
was reheard (NOT an appeal but a rehearing of the original case, since Google
didn’t defend itself the first time). At the same time, Google
reached a content agreement with Sofam
and Scam, two Belgian groups that cover
photographic and audio/visual content. Those groups joined the
Copiepresse case in October. The
agreement got them to drop out of the case.
- February 2006, Court Ruling, Google: The Belgium court found that Google did violate copyright when including material from several Belgian newspapers in its search index. However, the initial fine it was charged was reduced, and it remains in appeal over how much those will ultimately be. Both sides have different stories on the ruling. Google’s spin was that it was narrow to the particular publishers and that it could continue to operate as normal, on an opt-out basis, with others. The publishers argued that potentially, the ruling could be used by other Belgian courts as guidance.
For a more detailed look at the case over time, see my Google Loses In Belgium Newspaper Case article.
One of the most contentious issues in the case was over Google’s cached copies, where Google makes available a copy of a page to people on the Google web site itself. Several of the publications were understandably upset that people could use the cache option to find content that after initially being published was removed and only offers to paid or registered members.
It makes for a powerful argument, that Google is reprinting without permission (as are all the other major search engines that provide cached copies). It is one reason I wish Google would take the lead and stop caching pages (my Search Engines, Permissions & Moving Forward In Copyright Battles article goes into depth about issues here).
However, the situation is easily avoided. Publishers simply need to use a noarchive tag on pages they don’t want listed. Indeed, as I and others have long argued, the entire Belgian newspaper case was unnecessary if the real goal was to stay out of Google or keep cached pages out of Google. Existing standards give publishers automatic means to do this.
Now the Belgian papers are doing that, exactly what could have been done at the start. You can see this in action at Le Soir. The home page has this on it:
<meta name="robots" content="index, follow, noarchive" />
I’ve bolded the key part, the noarchive command. That tells Google not to cache the page (the index and follow parts, by the way, are entirely unnecessary. See my Meta Robots Tag 101: Blocking Spiders, Cached Pages & More article for much more guidance about how the tag works).
If you do this search, site:lesoir.be, you can also see how pages from Le Soir are listed in Google but without having any cached links present (if you don’t know what those are, see my Squeezing The Search Loaf: Finding Search Engine Freshness & Crawl Dates for a detailed explanation).
The change means that the Belgian papers will now again begin receiving traffic from Google, something which they lost after suing to get out of Google News. That lawsuit resulted in them being taken not just out of Google News but Google entirely. The traffic drop had to have been painful. A new report from Hitwise shows that at least for the US, newspapers get 25 percent of their traffic from search engines.
The pain might have been worth it if the group was able to force Google into a licensing agreement, which many publishers seem to believe bestows some publishers with riches from Google. The reality is no one knows what is in these agreements. The remain relatively few, and they’ve primarily seemed to be a way for Google to work with parties threatening them or actually suing them to find some face-saving solution for both sides. My AFP & Google Settle Over Google News Copyright Case article covers the most recent one and recaps others.
Today’s move does NOT involve an agreement at all. I asked Google specifically about this. As the statement notes, both sides remain talking about the pending appeal. I suspect we’ll see Google ultimately find a way to work more closely with the papers, perhaps even an agreement for extended use of content beyond what Google would consider fair use. But it’s notable that Google didn’t have to follow the agreement pattern we’ve seen to date.
I think a key reason is that Google was able to demonstrate the power of its traffic. With the AFP and AP agreements, it’s incredibly difficult for Google to remove AFP and AP content from its search engine, since hundreds of member publications post this material. With the Belgian papers, the number of them were fewer. In addition, Google was to some degree given a gift by the court ruling. It could yank out the publications, let them discover how their traffic dropped yet not seem vindictive since Google was, after all, only complying with a court ruling.
More recently, the editor of Daily Telegraph recently has tried to play the "Google’s ripping up off card." He said at an industry summit recently:
"Our ability to protect content is under consistent attack from those such as Google and Yahoo who wish to access it for free. These companies are seeking to build a business model on the back of our own investment without recognition. All media companies need to be on guard for this. Success in the digital age, as we have seen in our own company, is going to require massive investment… [this needs] effective legal protection for our content, in such a way that allows us to invest for the future."
The reality remains that if the Telegraph doesn’t want to be in search engines, it has existing ways to stay out and keep out right now. If it tries to go the legal route, perhaps it will get an agreement, if it thinks its important enough. But it might also find that Google could decide there are other sources of content and cut-off the traffic flow to the paper.
I have to add I find the Telegraph statement ironic given that I know they’ve had SEO work done in the past, work done to try and get traffic from search engines for free. Rarely (if ever) do you hear any of the newspapers complaining about Google also suggesting that they themselves should pay for inclusion.
Overall, I cannot help but find the move a victory for Google. I’d still like to see caching move to an opt-out system, as I’ve said, which I think would make search engines overall seem less like content leaches from some. And I want to continue to see further indexing controls be handed to publishers, such as the partial page blocking via robots-nocontent that Yahoo rolled out yesterday. These types of options are essential to ensure that search engines see publishers as partners in the indexing process, rather than subjects.