Josh Cohen Of Google News On Paywalls, Partnerships & Working With Publishers
Want to do a paywall with no “first click free?” That’s fine with Google, says business product manager Josh Cohen. Want to do micropayments? Google will be “flexible” in considering support of new business models like this. But if you charge, expect less traffic, and also expect that your competitors will be “ecstatic” to pick […]
Want to do a paywall with no “first click free?” That’s fine with Google, says business product manager Josh Cohen. Want to do micropayments? Google will be “flexible” in considering support of new business models like this. But if you charge, expect less traffic, and also expect that your competitors will be “ecstatic” to pick up your loss, he said. Cohen’s comments on paywall issues were part of a wide-ranging interview I had with him about Google and its news service.
In the interview, Cohen also repeatedly stressed that publishers are free to deal with Google as they like. And if they wanted to exclude Google in favor of a competitor like Microsoft, they have that choice — though Google would prefer to work with everyone. Indeed, Google’s inclusion of many diverse sources is one reason he thinks Google might take hits from certain publishers that don’t criticize Yahoo News, which is a far more popular news service than Google News.
Below, you’ll find Cohen’s comments on these and other issues, along with a summary of how Google handles free, registration and subscription-based content. I spoke with Cohen in early October (a busy month in search has kept me from getting this posted until now). His comments are even more relevant given that talk of news publishers perhaps blocking Google have only ramped up in the past weeks.
Paywalls & Google Are Not Mutually Exclusive
As publishers increasingly consider putting up paywalls (a barrier to reading a story unless you’ve paid for it in some way), I also see commentary from others arguing such moves would be dumb, that they’ll be the final nail in killing some publications. What’s a confused publisher to do? For me, I find it puzzling publishers believe they have to make a choice. They can have their paywall AND Google traffic combined, via Google’s First Click Free program. Are there many publishers who simply aren’t aware of this program? Cohen responded:
Yep …. I’m often surprised, and maybe this means I’m not doing my job particularly well, but there are some basic questions on First Click Free and support for subscription content or the rankings or what we try and do.
Some of that are the challenges that you can’t make everything widely available. But we do try to be fairly transparent with what we’re trying to do within our algorithm on the Google News side of it, because we recognize that it’s important to different publishers. We want to be in a situation where the best content wins, not the best SEOed site. So if we can put that out there as much as possible and in essence give all publishers a level playing field, then the user wins.
If a publisher understands that quality, original content does best and therefore tries to create more, great. But there is still a lot of those discussions that take place where people will say … ‘I have to make this content free or Google won’t index it,’ and that’s not the case.
First Click Free is only one example of the ways that publishers can make subscription content available. They can do previews, they can block it in different ways. I think there are a lot of those questions about the nuts and bolts of how you can work with us, subscriptions just being one of them.”
Paywalls Don’t Require That The First Click Is Free
Of course, one concern with First Click Free is that it can allow a savvy person to effectively read your publication for free, as can happen with the Wall Street Journal. In fact, News Corporation raised this as an issue this week (My Would Someone Please Explain To News Corp How Google Works? article looks at News Corp’s comments in more depth).
That’s the first time I’ve seen a major publisher talk about issues with the First Click Free “backdoor,” but it was something I was expecting to come up, so it was on my list of questions I’d put to Cohen. Can publishers be listed in Google News and still protect their paywall? Cohen said yes:
You can allow us to crawl content and show a preview to the user and label it as a subscription. So that happens today. You can do that in Google News today …. as long as you’re not cloaking [showing Google something different than what a visitor would see, when they finally arrive at the page they’ve paid for].
You can also show a preview and we can index it, as long as it’s a consistent experience, where if you’re showing us a preview, and we can index that, and that’s what you show the user, that’s fine too.
POSTSCRIPT: Google Modifies “First Click Free” Policy To Accommodate Publishers Gating Their Content from Dec. 1, 2009 covers how Google slightly closed the “backdoor” since this interview was conducted.
How Google Handles Free & Paid News Content
In summary, here are the basic ways Google News handles news content that is free, requires free registration or requires payment to view. These are also described in Google’s help pages:
Free Content: Content is free. Google can index an entire story to make it searchable. People can find the story in Google and read the entire thing for free.
First Click Free: Content is behind a paywall. Google is allowed past and can index the entire story to make it searchable. People can find it in Google and read the entire thing for free. From that story, people cannot click to read other stories at the same publication for free (hence, the “first click free” name). However, people can potentially go back to Google, find another article from the same site, click to it from Google and read that.
Subscription: Content is behind a paywall [or requires free registration to read]. Google is allowed past and can index the entire story to make it searchable. People can find it in Google. They can only read the entire thing if they pay [or register].
Preview: Content is behind a paywall [or requires free registration to read]. Google is NOT allowed past to index the entire story to make it searchable. People can find it in Google only based on the preview content. They can only read the entire thing if they pay [or register].
And because I like charts:
|Listing Method||Google Sees…||Visitor Sees…|
|Free Content||Full story||Full story, for free|
|First Click Free||Full story||Full story, for free, if they click from Google|
|Subscription||Full story||Summary. Must pay or register to see full story|
|Preview||Preview / summary||Summary. Must pay or register to see full story|
Clearly, there’s room for improvement. Flagging content that can be viewed through free registration as the same as content requiring payment doesn’t make much sense. It’s also unclear to me if the “subscription” label means much to someone viewing search results. Perhaps “Pay Per View” speaks better to what they can expect.
The Preview option may be hard to comprehend for some publishers. It seems designed for those who are absolutely paranoid that they don’t want Google to index their content, even though giving Google only a summary story degrades the chance of performing as well in search result. The subscription option increases their findability yet protects their content from the general public just as well.
No Subscription Option For Regular Google Search
Note that the options above also are only for Google News. In Google Web Search, the Free and First Click Free options are allowed. Publishers can manually do Previews on their own. But there’s no option to show subscription-based content in Google Web Search, with the exception of Google Scholar content.
That’s a big deal. Newspapers get tons of traffic from regular Google web search. If they go subscription-only, they’ll lose that traffic. That might be fine for News Corp, which argues that the traffic it receives isn’t that valuable (despite, oddly, also purchasing ads on Google to gain more traffic). But the difficulty for Google in allowing subscription content into regular web search is that if the top results get flooded with it, uses may become dissatisfied and express their frustration on Google.
Open To Listing Arrangements
Some of how things currently work may change, however, especially as the news industry itself is attempting new business models. Said Cohen:
If people are putting more and more behind a paywall, it’s in both parties’ interest to be as flexible as possible around that, provided you can maintain a good user experience.
For us, we obviously want to be able to index that content in a reasonable way for our users. If we can find ways to be flexible in supporting whatever models come up next, whether it’s micropayments or whatever else that may be on the horizon, that’s good for us.
For [publishers], there is still a recognition that discovery is really important. I would argue even more important if you’re putting content behind a paywall, because all of a sudden, depending on your model, again, you’re potentially shrinking your potential base of users. So you want to increase the size of that funnel, you don’t want to restrict it even further.
That’s why we have those discussions with publishers. That’s why we work with the Wall Street Journal, the FT [Financial Times] and others with their subscription content because again, both of us are trying to do the same thing, which is make sure their content can get found.”
Subscription Content Doesn’t Rank As Well
Of course, while publishers are free to go subscription-only within Google News, they risk having lower visibility if they do so. Cohen explained:
The reason that subscription content won’t do particularly well in search results is just the user behavior. I’m not saying all information wants to be free and has to be free, but the user behavior is by and large that people don’t pay for a lot of that content.
If you have subscription content, the user response to it will in effect tell the algorithm this isn’t not a relevant result, I’m not clicking on this. By making it free or by in essence saying it’s paid but Google treats it as free [because of First Click Free], there’s a significant advantage to them, because all their content is indexed, and I think at the end of the day probably helps the results. People are more likely to link to it and all the different ways it can be beneficial.
It’s not that the Google News algorithm treats subscription-content as second-class just because it is flagged subscription. Instead, Cohen clarified, it’s that the algorithm tries to mirror what users like. Since they largely bypass subscription content, less of this is surfaced. (See Under The Hood: Google News & Ranking Stories the third part of this interview, for more about how Google News ranks content)
The Many Shades Of No
Cohen also stressed again that the choices remain with publishers. Beyond the Free / First Click Free / Subscription options, they can opt-out entirely from Google News or Google itself. They can also say no in a far more granular way than some like those backing ACAP (a proposed next-generation blocking and access system for search engines) would have you believe:
There are discussions saying ‘You’re stealing my content,’ but publishers have complete control on whether that content goes online in the first place. The publisher’s in complete control about the business model. If they want put up a paywall, again, publishers can put up a paywall. We don’t force you to make it free. In fact, we work with a number of publishers today who charge for content.
The other part of the extreme is even if you’re online, that doesn’t mean that we can come in and force you to index your content with us. And this is the whole robots stuff, where if you don’t want to put it in Google, or even just in Google News, you can block it, you can segment it or if you don’t want to show snippets. If you don’t want to show images, you can do that too. The publisher has complete control about whether that content is displayed. [By “robots stuff,” Cohen is referring to the robots exclusion protocol, explained further below in this interview.]
We certainly hope people don’t opt out. We think there’s high-quality content, and we want to be able to index it, but at the end of the day, the publisher has control over that. So this sense that they have no choice but to be in Google, that they’re forced, that we’re breaking into their house and taking that print, digitizing it, putting it online and forcing it to be free …. There are so many steps along the way where they can say ‘Stop’ and we will respect that, 100%.
Build Your Paywall & Others May Go Elsewhere
How about the idea that newspapers all need permission to collude, to discuss openly banding together blocking all their content, because if they don’t all do it at once — or if one person stays out offering material for free — then collectively they all lose. From that comes a suggestion that they really don’t have a choice, that they have to be in Google. Cohen disagrees with this.
They do have a choice. I think the reason publishers want to be in Google is because of the value we deliver. There are a number of sources of information competing on the web today, so making sure your content is discoverable is by and large a good thing.
This idea that ‘Oh, so nobody’s going to read any other source other than what’s in a newspaper.’ Think about that for a second …. you have a number of different sources out there that are non-newspapers who are probably just ecstatic at the prospect of a lot of paywalls going up in a lot of different categories.
You know, pick a category. CNN, general interest news, for example, I’ve got to think, and I don’t know, I don’t know anything, I’ve got no insight into CNN’s thought process and maybe I’m wrong, but they probably get a ton of traffic and do a fairly healthy business on the online side of things.
So if all these newspapers go behind a paywall, I would have to think that somebody like them, who’s in a strong position right now, is probably going to take a different position.
Want To Exclude Only Google? That’s Fine
That led me to the other solution that gets floated out there, a sense that there needs to be either an improved “hot news” law or tighter restrictions on fair use, so that people cannot so easily summarize stories (such as when a blogger does highlights of a news story or when a mainstream news source summarizes a story from another mainstream publication).
When the interview happened, the AP has just suggested it might give its stories in advance to certain portals — widely interpreted that maybe Bing could get a head-start with news stories over Google. What did Cohen think of those lobbying for fair use changes and the AP’s push?
Cohen didn’t really answer the fair use portion, though Google’s been pretty clear that they feel what they do falls under fair use. As for exclusivity, he didn’t seem bothered if someone wanted to partner with people other than Google.
There’s a challenge over these existing business models. I think some of it is, these are businesses that had 20, 30 percent margins. And that’s probably not going to be the case going forward. I don’t see that just on the business side, forget both the cyclical and the secular changes to it.
Again the reality of it is that the publishers have complete control over their content on the web. Whether or not you digitalize it, whether or not it’s paid or free, and whether or not Google in particular has access to it. With robots, you can specify by user agent.
If you want Yahoo and Microsoft, and you want to do a deal with Microsoft, and only have Microsoft have access to your content and say Google, don’t index my content, you can do that.
The day after my interview with Cohen, Google CEO Eric Schmidt got asked about the AP news, as well, saying that Google had to be careful not to favor one publication over the other in terms of speed or latency.
Excluding Google News Vs. Excluding Google Web Search
One issue that came up with some Italian publishers recently what the suggestion that if you opt-out of Google News, you can’t be included in Google Web Search, which generally seems to send more traffic to publishers over time that Google News does.
So, the allegation was, if you object to being in Google’s news portal that you view as a competitor, you also have to exclude yourself from also being in web search results that you might not view as so much a competitor (see Debunking The Italian Newspapers’ Antitrust Allegations Against Google for more on this).
Not true. The reality is that you can opt-out of Google News but still be included Google Web Search. However, you can’t do this automatically. You have to request being dropped from Google News (which, by the way, also doesn’t automatically include anyone. Some human reviewer at Google decides to include a source, or sources have to manually request inclusion if they’re not already listed).
Shouldn’t the Robots Exclusion Protocol options (robots.txt files or the meta robots tag) used to signal automatic exclusion from indexing allow you to say no to Google News but yes to other Google search properties, such as Google Web Search? That’s not easy, Cohen said, but it could come:
We recognize with Google News, it’s very important to publishers to give them that option, if they want to opt-out specifically from Google News. We allow them to do that, simply by telling us to remove them. But as you apply to a whole set of different services within Google, it just gets hard to define what is the use that’s intended for a given service. That’s the main issue from our side.
But at the end of the day, we want to give as much control to publishers over their content as possible. So if that’s something that we can do in a way that works for users in general, there’s not a business reason why we wouldn’t want to give that control.
POSTSCRIPT: Google Adds Googlebot-News User Agent To Allow Blocking Google News from Dec. 2, 2009 covers how Google now allows for automatic blocking of Google News.
Google’s Inclusion Of Many News Sources May Draw Attacks
I also wondered why Cohen felt publishers seem to attack Google more when Yahoo News still outdistances Google as a leading news site by more than 3 to 1, not to mention that unlike Google News? In part, he seemed to suggest that ironically, it’s because Google’s trying to show a diverse and level playing field of sources. It includes a lot of publications and sends many of traffic to many of them, rather than a select few.
I don’t know that it is Google News versus Yahoo News. Probably, they view it more broadly as Google overall and Yahoo overall and getting back to size questions. But I can’t answer that. I’m not in a position to do so. It’s more of a question from the publisher side of it.
From the news site of it, this is not a value judgment, just Yahoo News and Google News have just different approaches and different business models. We’re both aggregators, but they’re more of a portal. Their focus is on creating a Yahoo-branded experience, editorial voice, business model [unlike Google News, Yahoo News employs journalists to produce original content]. They do send traffic to partners but a large part of it, that experience takes place on Yahoo’s site.
What we’re trying to do within Google News is around having as many different perspectives as possible on a given story and a diversity of sources. Also, again keeping with how we tend to operate as a company, our business model is about directing that traffic back out to the publisher’s site. It’s their business model, it’s their look-and-feel, their editorial voice. Again, I honestly don’t mean that as any sort of a value judgment. It’s just different approaches to the space.
Google News & Content Partnerships
Would it help if Google partnered more with content owners for licensing agreements in the way that Yahoo apparently does? Cohen replied that Google has “thousands” of partnerships. But to me, most these are down to AdSense, agreements to place Google’s ad serving code on partner web sites — where the “partnership” has little more to do with someone filling out a self-serve form. What about partnerships that cover the use of material on Google’s site itself, such as how Yahoo licenses out wire content.
Cohen said that Google has 11 partnerships like this of its own, at the moment. After the interview, I was sent the full list:
- UK Press Association
- Canadian Press
A deal with the AP was made in 2007, followed by agreements with AFP, UK Press Association and Canadian Press in 2008 (see Google News Now Hosting Wire Stories & Promises Better Variety In Results for more). The latter seven listed are all members of the European Pressphoto Agency and signed an agreement with Google in March 2009 (see also here).
Belgian papers that fought a much publicized battle against Google in 2006 were reincluded in 2007. At the time, this was not due to a financial settlement (see Belgian Papers Back In Google; Begin Using Standards For Blocking). As they don’t appear on the list of formal partners I was sent, I assume there still remains no formal agreement for their inclusion. I also don’t see on the list a deal with Sofam & Scam, two Belgian photo and A/V services that joined the case after it started, got an agreement, then dropped out of the suit. So I’m not sure if this is still active — I’m checking on this.
Why Do Deals? Reducing Duplicate Stories One Reason
The content deals have been stuck for various reasons, Cohen said. With both the AP and AFP, those deals were designed in part supposedly to help with the problem of duplication of wire content on Google News. The same wire story might appear in various newspapers at the same time, which can confuse Google into thinking the stories are all different, when they’re the same. Said Cohen:
If you’re trying to show different perspectives, having 50 copies of the same story doesn’t make sense.
Continuing, he also talked about why as part of the deal, Google agreed to host AP articles on Google itself:
They have a different business model where the focus [of the AP’s web site] is primarily on the B2B space. They don’t have a [consumer] destination. They don’t want to, or they can’t, because of the challenges that might cause for them with sales.
By sales, Cohen refers to the issue that the AP both gives its own original content to member publications plus take stories in from them to distribute to all AP members. If the AP creates a news portal of its own, it potentially competes with member publications for readers, using content from some of those members.
So with the AP, the existing deal was intended to help solve dual issues, that of Google wanting to reduce duplicate stories and for AP to have Google host its stories in a somewhat “neutral” environment that might be more acceptable to its members, since the AP couldn’t host the news itself.
AP’s Ranking Boost Quest
Interestingly, one of the AP’s top concerns with a new deal with Google appears to be that it wants to rank better in Google’s results. Unfortunately, the AP shows its ineptness in understanding search when it speaks like this. There is no way — no way — that Google’s going to guarantee the AP a ranking boost over other news sites.
Really, what the AP seems to want to ensure that if one of its stories is managing to get into the top results, that the AP itself gets the spot, not a submission of the story over at Digg, not a summary of the story over at the Huffington Post, not a copy of its story on one of its many member publications. As best I can tell — I’ve not had luck getting the AP to talk to me directly, but I’ll be trying again.
That’s actually more reasonable. In fact, SEOs have long been lobbying Google for ways to ensure that original source documents show up ahead of pages that simply reference those documents with little value-add (IE: news flash AP, this isn’t just your problem, and people have been actively working long before you to help solve it). One solution that came this year was the canonical tag, which is about to expand with cross-domain support.
Another solution remains with the AP itself. By not having its own news portal, by having stories that can disappear after 30 days, it constantly shoots itself in the foot to gain the links that would let it naturally rank better in Google.
Consider an AP story from October published with the AP’s cooperation at Google News, covering how the AP itself might want to charge some portals for early content access. Techmeme featured the story, as did several blogs like ReadWriteWeb and Mashable. By try to read the story they all linked to, and it’s gone.
For more on the issues addressed above, see these articles:
- What The Associated Press is saying to Google, Microsoft, and Yahoo
- Sorry, Tom Curley: Don’t Expect A Google Ranking Boost For The AP
- Hey AP! How About Running A Real News Web Site?,
- How The AP Fails To Get Search & SEO (Again)
But Duplicates Still Get Through
Back to the existing Google-AP deal. There, the de-duplication aspect hasn’t panned out as well as promised. It’s still possible to search on Google and encounter the same AP story being hosted by different newspapers. In addition, the theory was that if there was an AP story, it was the AP story hosted at Google that was supposed to get to billing, not the same story at members. Cohen acknowledged there are still issues:
We’re still indexing everything and showing the duplicates, but trying in the default results to show the canonical page [the AP story on Google itself]
Continuing, he explained there are challenges, in that what may seem like the same AP story in different papers might not be the exact same story.
“If they [an AP member publication] edit it or add original quotes, it begins to change,” he said. “There’s a gray area of trying to get that right and capture changes. We want to capture [and show] substantial changes but not have someone tweak a headline or byline and get listed as if it’s a different story.”
Still, it’s been over two years since Google began offering hosted wire stories. You’d expect these problems to be sorted out by now.
Not Said: Deals Stop Lawsuits
What the deals have been most successful at, at least with the AP and AFP, have been to defuse lawsuits. The AFP did sue Google; the AP threatened. Google said what it was doing with news stories fell under fair use and that the deals it cut were specifically for “new” and “extensive” uses of wire content that went beyond fair use (see AFP & Google Settle Over Google News Copyright Case for more).
Cohen reiterated this when I asked specifically about it during our interview. Google completely disagrees that listing a headline and summary of a story, with a link to that story, violates copyright laws as some publishers have contended. Nevertheless, the agreements got some wire services off Google’s back.
The AP Pushes At Google
The fires are being stoked again, however. The AP agreement is being renegotiated, and the organization has sent conflicting messages about how and when it considers listing articles to go beyond fair use.
The AP is also launching a “news registry” it says will allow it to track usages of AP content in part as a way ensure content is properly licensed (according to the AP’s view; others may disagree). Confusingly, on Friday, there came a report from Business Insider that the AP may want Google to maintain this registry (Google said they had no comment about this).
Meanwhile, an earlier project that the AP backed — the aforementioned ACAP — moves along with its own system of automatically transmitting licensing information, not that anyone is currently trying to actually license material in the way that ACAP-backers hope. Also on Friday, there came a report that from TechCrunch that Microsoft might try to woo publishers into blocking Google and get behind ACAP.
I’d previously talked with Google’s Schmidt about the business dealings with the AP, so see my interview with him for more on that (Google CEO Eric Schmidt On Newspapers & Journalism). With Cohen, I focused on more technical aspects.
Robots.txt Works Fine; ACAP Needs Progress
In particular, Google’s primary way of dealing with publishers remains the Robots Exclusion Protocol (REP). With the AP having pushed two alternatives to this, is there a problem with REP? Will Google get behind ACAP?
The AP stuff [the news registry] is still so vague that I can’t talk too much about it. The ACAP thing, there’s a challenge with some of the specific protocols. A good chunk of it can already be done with robots [REP], and it already works.
Continuing, Cohen explained further:
That’s not to say there isn’t more than can be done [with REP]. To keep that moving forward makes sense. Some things they [ACAP] want to do, it’s not a question of being bad for Google but bad for the web that opens the door to a lot of spam. For example, a directive of what the snippet should be, that this is what you have to show. One guy said the whole spam thing is overstated.
You have to think about any protocol to work for the common web, not just news publishers. We’ve had these discussions. Unfortunately, we haven’t seen much progress there.
The lack of considering the “common web” resonates with me. As my Search Engines, Permissions & Moving Forward In Copyright Battles article explains, a weakness to me with ACAP has long been that it was designed by news publishers, for news publishers while search engines deal with more content than that. From my article, I wrote;
A new system to be developed with the search engines and a broad range of publishers for online indexing. That’s not ACAP, in the sense that ACAP had not specific solutions when it rolled out. Moreover, ACAP really represents the interests of a minority of publishers on the web, news publishers. Web publishers are online merchants and small bloggers and forum owners and those with personal home pages and B2B business and Fortune 1000 sites and local merchants with single pages and more. No, every constituency can’t be represented. But any new system needs more broad-based participation.
ACAP recently rolled out an updated specification (PDF); I’ll be looking at the system and how it weighs up against REP in a future article.
Don’t Blame Us; Blame The Internet
During the interview, I also remarked how I find it personally odd to see publishers upset that Google and other “aggregators” are supposedly taking away their visitors. For me and for others, these places are our newspapers. And while publishers might prefer we start our day with them, it seems unlikely for many that this will change. But it also doesn’t have to be a negative because these same aggregators accused of robbing visitors to me also seem to be exposing news content that many would have never seen before.
Cohen commented similar to what Schmidt has said before in my interview with him, that Google gets blamed for disruption rather than changes the internet itself as a new communications medium has caused.
Eric touched on this, in how Google is often seen as synonymous with the internet. And so anything good that happens, in the space, Google did it, Google did something, and we may have had anything nothing to do with whatever innovation was out there. And anything bad that happens is Google’s fault.
So this idea, the issues that publishers have around Google News. I mean if Google News didn’t exist or even broader, if Google didn’t exist, it’s not going to change the basic disruption that’s taken place within digital media. And in fact, if anything — and I certainly see the disruption that’s there and I can recognize that — but I think you also have to recognize that the train left the station before Google came to fruition. And Google is just a tool to help you navigate that. And the different properties are there that help you find that information in an increasingly fragmented space.
NOTE: Google’s News Experiments & The Quest To Solve The “Read State” Issue is the second part of this interview that deals with Google’s experimentation with news products; Under The Hood: Google News & Ranking Stories is the third part that looks more deeply into how Google News determines what to show visitors.