Google Researchers Introduce System To Rank Web Pages On Facts, Not Links
Will Google someday rank web pages based on how accurate they are? A new paper suggests they might.
Close your eyes and imagine a world where web pages are ranked not only on popularity — i.e., the links that point to them — but also by the accuracy of information they contain. That world may not be too far off.
As New Scientist recently reported, a team of research scientists at Google has published a paper (PDF) explaining the idea of Knowledge-Based Trust (KBT), an alternate way of determining the quality of web pages by looking at how accurate they are.
The quality of web sources has been traditionally evaluated using exogenous signals such as the hyperlink structure of the graph. We propose a new approach that relies on endogenous signals, namely, the correctness of factual information provided by the source. A source that has few false facts is considered to be trustworthy.
The paper goes on to describe how Google could use an extraction process to compare the facts it finds on web pages to facts that are stored in a knowledge base (think Knowledge Graph/Knowledge Vault), and reward pages that are found to be more accurate. In cases where a single web page doesn’t have enough facts, the paper suggests relying on other pages from the same website to determine trustworthiness.
Google has been building a massive database of known facts for years, and in 2012 introduced its Knowledge Graph. That’s the source of those information boxes that show on the right side of Google search results (primarily) for searches involving people, places and known entities.
The authors say their early tests of Knowledge-Based Trust have been promising. “We applied it to 2.8 billion triples extracted from the web, and were thus able to reliably predict the trustworthiness of 119 million web pages and 5.6 million websites.” (Note: The paper uses “triples” to describe the factual elements found and extracted from web pages.)
This KBT concept wouldn’t necessarily work uniformly across the internet, since many web pages don’t exist to share facts and aren’t about entities that exist in a Knowledge Graph-style database.
Along those lines, the authors say this way of measuring trustworthiness “provides an additional signal for evaluating the quality of a website,” and could be used “in conjunction with existing signals such as PageRank” — not necessarily as a replacement.