Search Engine Land » SEO » AI search engines often make up citations and answers: Study

AI search engines often make up citations and answers: Study

I read the news today, oh boy. AI search engines give wrong answers 60% of the time and often cite fabricated or broken URLs.

Danny Goodwin on March 11, 2025 at 3:04 pm | Reading time: 2 minutes

Chat with SearchBot

Please note that your conversations will be recorded.
I am trained with Search Engine Land content. Ask me anything!

SearchBot is thinking ...

News

AI search engines and chatbots often provide wrong answers and make up article citations, according to a new study from Columbia Journalism Review.

Why we care. AI search tools have ramped up the scraping of your content so they can serve answers to their users, often resulting in no clicks to your website. Also, click-through rates from AI search and chatbots are much lower than Google Search, according to a separate, unrelated study. But hallucinating citations makes an already bad situation even worse.

By the numbers. More than half of the responses from Gemini and Grok 3 cited fabricated or broken URLs that led to error pages. Also, according to the study:

Overall, chatbots provided incorrect answers to more than 60% of queries:
- Grok 3 (the highest error rate) answered 94% of the queries incorrectly.
- Gemini only provided a completely correct response on one occasion (in 10 attempts).
- Perplexity, which had the lowest error rate, answered 37% of queries incorrectly.

What they’re saying. The study authors (Klaudia Jaźwińska and Aisvarya Chandrasekar), who also noted that “multiple chatbots seemed to bypass Robot Exclusion Protocol preferences,” summed up this way:

“The findings of this study align closely with those outlined in our previous ChatGPT study, published in November 2024, which revealed consistent patterns across chabots: confident presentations of incorrect information, misleading attributions to syndicated content, and inconsistent information retrieval practices. Critics of generative search like Chirag Shah and Emily M. Bender have raised substantive concerns about using large language models for search, noting that they ‘take away transparency and user agency, further amplify the problems associated with bias in [information access] systems, and often provide ungrounded and/or toxic answers that may go unchecked by a typical user.'”

About the comparison. This analysis of 1,600 queries compared the ability of generative AI tools (ChatGPT search, Perplexity, Perplexity Pro, DeepSeek search, Microsoft CoPilot, xAI’s Grok-2 and Grok-3 search, and Google Gemini) to identify an article’s headline, original publisher, publication date, and URL, based on direct excerpts of 10 articles chosen at random from 20 publishers.

The study. AI Search Has A Citation Problem

Add Search Engine Land to your Google News feed.

Related stories

New on Search Engine Land

As AI scraping surges, AI search traffic fails to follow: Report

Google Ads expands PMax negative keyword limits

Google expands Message asset to Performance Max

SEO isn’t just 10 blue links anymore

Google Search is 373x bigger than ChatGPT search

About the author

Staff

Danny Goodwin

Danny Goodwin is Editorial Director of Search Engine Land & Search Marketing Expo - SMX. He joined Search Engine Land in 2022 as Senior Editor. In addition to reporting on the latest search marketing news, he manages Search Engine Land’s SME (Subject Matter Expert) program. He also helps program U.S. SMX events.

Goodwin has been editing and writing about the latest developments and trends in search and digital marketing since 2007. He previously was Executive Editor of Search Engine Journal (from 2017 to 2022), managing editor of Momentology (from 2014-2016) and editor of Search Engine Watch (from 2007 to 2014). He has spoken at many major search conferences and virtual events, and has been sourced for his expertise by a wide range of publications and podcasts.