News SEO and generative AI: Inside a ‘parasitical relationship’

A look at how news publishers and media outlets can retain traffic, revenue and relevancy in the generative AI era.

Chat with SearchBot

As reports circulate that AI research lab OpenAI uses news stories from media outlets like the Wall Street Journal and CNN to train its ChatGPT chatbot, an even greater challenge emerges: How do media outlets retain traffic, revenue and relevancy in the generative AI era?

AI-generated news has long inspired fear among journalists. In 2016, for example, the U.K.’s Press Association signaled its intent to use AI for some sports and election stories.

We’ve seen more recent examples in the U.S., like this NHL roundup from the Associated Press compiled with technology from sports content automation firm Data Skrive.

The CEO of media company Axel Springer, which owns titles like Business Insider and Politico, recently said AI has the potential to replace journalists altogether. “Only those who create the best original content will survive,” Springer reportedly wrote in a letter to employees.

The issue of copyrights – and potential legal trouble, has already surfaced in France and Spain.

“If OpenAI is going to enhance its model with up-to-date content without sending any traffic [to the original source, it will] spark a debate [over] who owns the rights for the content,” said Marcus Tober, senior vice president of enterprise solutions at marketing platform Semrush.

OpenAI has already seen some copyright lawsuits, and Dan Smullen, head of SEO at sports gambling platform Betsperts Media and Technology Group, said we could expect more shortly.

“In fact, despite hearing that some publishers have begun to adopt AI-assisted content in the newsroom, the editorial teams I have spoken to are uncomfortable using the outputs from OpenAI due to the unknown copyright issues,” Smullen added.

OpenAI has taken steps to address these concerns, such as allowing publishers to opt out of having their content used, he noted. The AI research lab has also agreed to provide attribution when its algorithms scrape information from news sites.

“Still, SEOs in the media industry worry this system may not adequately protect against copyright and intellectual property issues,” Smullen added. “As such, news organizations should continue to monitor OpenAI’s use of news data and ensure that their content is being used responsibly.”

One easy solution would be to add footnotes linking to sources, similar to what ChatGPT does in Bing.

“We expect something similar with [Google’s conversational AI service] Bard,” Smullen added.

Get the daily newsletter search marketers rely on.


‘Truth decay’

Ultimately, AI’s push into news threatens to upend media consumption all over again.

According to Ben Poulton, SEO consultant and founder of the SEO agency Intellar, AI companies using scraped data “threatens the curated control that news organizations have had for decades.”

The result could be further degradation of journalistic integrity.

Smullen noted lack of publisher compensation for training data could lead to a future in which publishers block OpenAI and its counterparts, so high-authority news sites are not crawled. That, in turn, could yield an even bigger challenge with fake news, including wider circulation of inaccurate and/or biased information masquerading as fact.

As such, Smullen called for publishers to be compensated for the critical role they play – and Cameron Conaway, a former investigative journalist who leads a growth marketing team at tech giant Cisco and teaches digital marketing at the University of San Francisco, agreed.

“Could this deepen truth decay and society’s distrust of legitimate new sources?” he asked. “What impact might it have on democracy if most information is source-less, and who (or what) will then hold the power?”

‘Disastrous implications’

There’s even concern about OpenAI eventually automating news production altogether. Still, Barry Adams, a specialized SEO consultant at SEO firm Polemic Digital, noted generative AI systems can’t predict the news, so he doesn’t foresee any immediate issues.

“AI will not replace journalism when it comes to reporting the news, investigating stories and holding power to account,” he added.

Then again, AI could reword local news stories without citation as it spits out its own versions. This, in turn, would siphon traffic and related revenue from news sites, which is particularly harmful to local news sites that are especially reliant on display ad traffic, Conaway said.

And rewording has the potential to change the original meaning of the reporting.

“The combination of scrappy and financially vulnerable local newsrooms, general media avoidance and distrust and the rise of AI as a primary source could have disastrous implications,” he added.

But it’s not all – wait for it – bad news.

“On the plus side for news organizations, people will always consume news. It’s just the medium which changes,” Poulton said. “If ChatGPT can summarize five stories on the same topic from five different outlets in five seconds, is that not a good product? Maybe the likes of ChatGPT could be used on news sites to help users break down and find information they want quickly.”

‘A parasitical relationship’

First, however, the parties must address the issue of traffic and revenue.

Adams said the lack of attribution with early iterations of Bing ChatGPT and Google’s Language Model for Dialogue Applications, or LaMDA, concerns him most here.

“This undermines a fundamental contract of the web, where search engines and content websites exist in a symbiotic state,” he said. “Generative AI turns this symbiosis into a parasitical relationship, where the search engines take everything from the content creators (i.e., the content needed to train [large language models (LLMs)] on) and give nothing back in return.”

Google-owned YouTube, however, already uses a more symbiotic model in which content creators share in the revenue generated by the platform.

“There is no reason why a similar model couldn’t be adopted for search engines and the web, except that it would make Google less of a money-printing machine and lose some shareholder value,” Adams added.

Smullen agreed the solution is to pay publishers for training data. 

“Similar to Google, it will abuse its dominance until governments step up and question the legality of its business model from a copyright standpoint,” Smullen said. “It’s only fair that publishers be compensated for their role in making the next generation of AI possible.”

Adams agreed it’s unlikely Google will voluntarily reduce its own profits.

“They won’t care that they used the combined knowledge of humanity shared on the web to build these generative AI systems and are now discarding these creators without attribution,” he added. “If they can get away with it, they will.”

‘Remain vigilant’

Some news organizations have already responded with stricter licensing agreements, strengthened data collection and usage rules, and use of copyright protection software, according to Julian Scott, content strategist at social media management and automation tool Socialbu.

“However, these measures may not be enough to fully protect their content from being used without attribution,” he added.

Media industry SEOs are calling for better tools within OpenAI’s model, which would ensure proper credit, noted Daniel Chabert, CEO and founder of web and software development agency PurpleFire.

“They hope OpenAI will increase its transparency regarding the use of news data and be more proactive in alerting authors and publishers when their content is being used,” he added. 

Meanwhile, news organizations would be wise to invest in better monitoring systems to detect errors or biases in the data generated by OpenAI’s models. 

“News organizations must remain vigilant about OpenAI’s use of news data and take the necessary steps to protect their content and ensure accuracy and quality,” Chabert added.

‘A first-stop destination’

There’s also one tried-and-true online marketing tactic, which is particularly relevant here.

Adams noted websites need to start thinking about a “post-Google future” and build strong brands that tie their audiences directly to them.

“Some publishers are quite successful at this and have built brands that are almost immune to the whims of search engines,” he added. “The goal is to become a first-stop destination for your audience, with readers directly visiting your website without the intermediary of a Google or Facebook.”

As the impetus to click through to original sources lessens, Matt Greenwood, SEO manager at search agency Reflect Digital, agreed websites should be “looking to provide information and experiences that are more valuable than can be condensed into a few lines of auto-generated text, to give consumers a reason to still visit our sites and read our original content.”


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Lisa Lacy
Contributor
Lisa Lacy is a reporter who has covered advertising, technology and retail for publications like Adweek, TechCrunch, Digiday, CMO.com and VentureBeat.

Get the must-read newsletter for search marketers.