Everything you need to know about SEO, delivered every Thursday.
Belgian Papers Back In Google; Begin Using Standards For Blocking
Belgian newspapers that sued Google to be removed from its index are now back
in, having agreed to use the commonly-accepted blocking standards that they
initially rejected as not being legal. Google and the group representing the
papers, Copiepresse, have issued a
joint statement. That’s below, along with a look at how this is a victory for
Google, which has had to settle a series of similar lawsuits through agreements.
Let’s start with the joint statement:
Internet users interested in Belgian news and users of Google’s search
engine may have noticed today that the websites of the Belgian French and
German-language daily newspapers, members of Copiepresse, are again referenced
on the search engine.
This decision was made jointly by Google Inc. and Copiepresse as part of
the constructive dialogue that has resumed between the two organizations.
The websites of the Belgian French and German-language daily press will now
appear without a “cached” link in the search results of Google’s search
engine, thanks to their use of the “noarchive”* tag.
The Belgian French and German-language daily press publishers and Google
Inc. intend to use a quiet period in the court dispute to continue their
efforts to identify tangible ways to collaborate in the long term.
* The « noarchive » tag is a tool for publishers to control how their
website is referenced on the web.
Now let’s go back to the history of the dispute, then analyze today’s move.
- March 2006, Suit Filed: Copiepresse started legal action against
Google, arguing that Google’s use of the widely accepted and respected
or meta robots
standards as a way of opting-out of indexing was somehow trying to impose
copyright rules of Google’s own making on content owners. This is regardless
of the fact that robots.txt existed before Google did. It is not a standard of
Google’s making; it’s one many search engines support.
- September 2006, Court Ruling, Papers Removed: The case went to
court in Belgium on August 29, without Google being represented The company
had been sent a summons but failed to act upon it. A ruling on September 5
required that Google remove the sites and post a notice on its web site.
- November 2006, Court Case Heard Again: In November, Google’s case
was reheard (NOT an appeal but a rehearing of the original case, since Google
didn’t defend itself the first time). At the same time, Google
reached a content agreement with Sofam
and Scam, two Belgian groups that cover
photographic and audio/visual content. Those groups joined the
Copiepresse case in October. The
agreement got them to drop out of the case.
- February 2006, Court Ruling, Google: The Belgium court
found that Google
did violate copyright when including material from several Belgian newspapers
in its search index. However, the initial fine it was charged was reduced, and
it remains in appeal over how much those will ultimately be. Both sides have
different stories on the ruling. Google’s spin was that it was narrow to the
particular publishers and that it could continue to operate as normal, on an
opt-out basis, with others. The publishers argued that potentially, the ruling
could be used by other Belgian courts as guidance.
For a more detailed look at the case over time, see my
Google Loses In Belgium
Newspaper Case article.
One of the most contentious issues in the case was over Google’s cached
copies, where Google makes available a copy of a page to people on the Google
web site itself. Several of the publications were understandably upset that
people could use the cache option to find content that after initially being
published was removed and only offers to paid or registered members.
It makes for a powerful argument, that Google is reprinting without
permission (as are all the other major search engines that provide cached
copies). It is one reason I wish Google would take the lead and stop caching
pages (my Search Engines,
Permissions & Moving Forward In Copyright Battles article goes into depth
about issues here).
However, the situation is easily avoided. Publishers simply need to use a
noarchive tag on pages they don’t want listed. Indeed, as I and others have long
argued, the entire Belgian newspaper case was unnecessary if the real goal was
to stay out of Google or keep cached pages out of Google. Existing standards
give publishers automatic means to do this.
Now the Belgian papers are doing that, exactly what could have been done at
the start. You can see this in action at Le Soir.
The home page has this on it:
<meta name="robots" content="index, follow, noarchive" />
I’ve bolded the key part, the noarchive command. That tells Google not to
cache the page (the index and follow parts, by the way, are entirely
unnecessary. See my Meta
Robots Tag 101: Blocking Spiders, Cached Pages & More article for much more
guidance about how the tag works).
If you do this search,
site:lesoir.be, you can also see how pages from Le Soir are listed in Google
but without having any cached links present (if you don’t know what those are,
see my Squeezing The
Search Loaf: Finding Search Engine Freshness & Crawl Dates for a detailed
The change means that the Belgian papers will now again begin receiving
traffic from Google, something which they lost after suing to get out of Google
News. That lawsuit resulted in them being taken not just out of Google News but
Google entirely. The traffic drop had to have been painful. A new report from Hitwise shows that at least for the US, newspapers get 25 percent of their traffic from search engines.
The pain might have been worth it if the group was able to force Google into
a licensing agreement, which many publishers seem to believe bestows some
publishers with riches from Google. The reality is no one knows what is in these
agreements. The remain relatively few, and they’ve primarily seemed to be a way
for Google to work with parties threatening them or actually suing them to find
some face-saving solution for both sides. My
AFP & Google Settle Over
Google News Copyright Case article covers the most recent one and recaps
Today’s move does NOT involve an agreement at all. I asked Google
specifically about this. As the statement notes, both sides remain talking about
the pending appeal. I suspect we’ll see Google ultimately find a way to work
more closely with the papers, perhaps even an agreement for extended use of
content beyond what Google would consider fair use. But it’s notable that Google
didn’t have to follow the agreement pattern we’ve seen to date.
I think a key reason is that Google was able to demonstrate the power of its
traffic. With the AFP and AP agreements, it’s incredibly difficult for Google to
remove AFP and AP content from its search engine, since hundreds of member
publications post this material. With the Belgian papers, the number of them
were fewer. In addition, Google was to some degree given a gift by the court
ruling. It could yank out the publications, let them discover how their traffic
dropped yet not seem vindictive since Google was, after all, only complying with
a court ruling.
More recently, the editor of Daily Telegraph recently has tried to play the
"Google’s ripping up off card." He
said at an industry summit recently:
"Our ability to protect content is under consistent attack from those such as
Google and Yahoo who wish to access it for free. These companies are seeking to
build a business model on the back of our own investment without recognition.
All media companies need to be on guard for this. Success in the digital age, as
we have seen in our own company, is going to require massive investment… [this
needs] effective legal protection for our content, in such a way that allows us
to invest for the future."
The reality remains that if the Telegraph doesn’t want to be in search
engines, it has
existing ways to stay out and keep out right now. If it tries to go the
legal route, perhaps it will get an agreement, if it thinks its important
enough. But it might also find that Google could decide there are other sources
of content and cut-off the traffic flow to the paper.
I have to add I find the Telegraph statement ironic given that I know they’ve
had SEO work done in the past, work done to try and get traffic from search
engines for free. Rarely (if ever) do you hear any of the newspapers complaining
about Google also suggesting that they themselves should pay for inclusion.
Overall, I cannot help but find the move a victory for Google. I’d still like
to see caching move to an opt-out system, as I’ve said, which I think would make
search engines overall seem less like content leaches from some. And I want to
continue to see further indexing controls be handed to publishers, such as the
partial page blocking
via robots-nocontent that Yahoo rolled out yesterday. These types of options
are essential to ensure that search engines see publishers as partners in the
indexing process, rather than subjects.