How RankBrain Changes Entity Search
Columnist Kristine Schachinger provides a handy primer on entity search, explaining how it works and how Google is using its RankBrain machine learning system to make it better.
Earlier this week, news broke about Google’s RankBrain, a machine learning system that, along with other algorithm factors, helps to determine what the best results will be for a specific query set.
Specifically, RankBrain appears to be related to query processing and refinement, using pattern recognition to take complex and/or ambiguous search queries and connect them to specific topics.
This allows Google to serve better search results to users, especially in the case of the hundreds of millions of search queries per day that the search engine has never seen before.
Not to be taken lightly, Google has said that RankBrain is among the most important of the hundreds of ranking signals the algorithm takes into account.
RankBrain is one of the “hundreds” of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.
(Note: RankBrain is more likely a “query processor” than a true “ranking factor.” It is currently unclear how exactly RankBrain functions as a ranking signal, since those are typically tied to content in some way.)
This is not the only major change to search in recent memory, however. In the past few years, Google has made quite a few important changes to how search works, from algorithm updates to search results page layout. Google has grown and changed into a much different animal than it was pre-Penguin and pre-Panda.
These changes don’t stop at search, either. The company has changed how it is structured. With the new and separate “Alphabet” umbrella, Google is no longer one organism, or even the main one.
Even communication from Google to SEOs and Webmasters has largely gone the way of the dodo. Matt Cutts is no longer the “Google go-to,” and reliable information has become difficult to obtain. So many changes in such a short time. It seems that Google is pushing forward.
Yet, RankBrain is much different from previous changes. RankBrain is an effort to refine the query results of Google’s Knowledge Graph-based entity search. While entity search is not new, the addition of a fully rolled-out machine learning algorithm to these results is only about three months old.
So what is entity search? How does this work with RankBrain? Where is Google going?
To understand the context, we need to go back a few years.
The launch of the Hummingbird algorithm was a radical change. It was the overhaul of the entire way Google processed organic queries. Overnight, search went from finding “strings” (i.e., strings of letters in a search query) to finding “things” (i.e., entities).
Where did Hummingbird come from? The new Hummingbird algorithm was born out of Google’s efforts to incorporate semantic search into its search engine.
This was supposed to be Google’s foray into not only machine learning, but the understanding and processing of natural language (or NLP). No more need for those pesky keywords — Google would just understand what you meant by what you typed in the search box.
Semantic search seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results. Semantic search systems consider various points including context of search, location, intent, variation of words, synonyms, generalized and specialized queries, concept matching and natural language queries to provide relevant search results. Major web search engines like Google and Bing incorporate some elements of semantic search.
Yet we’re two years on, and anyone who uses Google knows the dream of semantic search has not been realized. It’s not that Google meets none of the criteria, but Google falls far short of the full definition.
For instance, it does use databases to define and associate entities. However, a semantic engine would understand how context affects words and then be able to assess and interpret meaning.
Google does not have this understanding. In fact, according to some, Google is simply navigational search — and navigational search is not considered by definition to be semantic in nature.
So while Google can understand known entities and relationships via data definitions, distance and machine learning, it cannot yet understand natural (human) language. It also cannot easily interpret attribute association without additional clarification when those relationships in Google’s repository are weakly correlated or nonexistent. This clarification is often a result of additional user input.
Of course, Google can learn many of these definitions and relationships over time if enough people search for a set of terms. This is where machine learning (RankBrain) comes into the mix. Instead of the user refining query sets, the machine makes a best guess based on the user’s perceived intent.
However, even with RankBrain, Google is not able to interpret meaning as a human would, and that is the Natural Language portion of the semantic definition.
So by definition, Google is NOT a semantic search engine. Then what is it?
The Move From “Strings” to “Things”
[W]e’ve been working on an intelligent model — in geek-speak, a “graph” — that understands real-world entities and their relationships to one another: things, not strings.
As mentioned, Google is now very good at surfacing specific data. Need a weather report? Traffic conditions? Restaurant review? Google can provide this information without the need for you to even visit a website, displaying it right on the top of the search results page. Such placements are often based on the Knowledge Graph and are a result of Google’s move from “strings” to “things.”
The move from “strings” to “things” has been great for data-based searches, especially when it places those bits of data in the Knowledge Graph. These bits of data are the ones that typically answer the who, what, where, when, why, and how questions of Google’s self-defined “Micro-Moments.” Google can give users information they may not have even known they wanted at the moment they want it.
However, this push towards entities is not without a downside. While Google has excelled at surfacing straightforward, data-based information, what it hasn’t been doing as well anymore is returning highly relevant answers for complex query sets.
Here, I use “complex queries” to refer simply to queries that do not easily map to an entity, a piece of known data and/or a data attribute — thereby making such queries difficult for Google to “understand.”
As a result, when you search for a set of complex terms, there is a good chance you will get only a few relevant results and not necessarily highly relevant ones. The result is much more a kitchen sink of possibilities than a set of direct answers, but why?
Complex Queries And Their Effect On Search
RankBrain uses artificial intelligence to embed vast amounts of written language into mathematical entities — called vectors — that the computer can understand. If RankBrain sees a word or phrase it isn’t familiar with, the machine can make a guess as to what words or phrases might have a similar meaning and filter the result accordingly, making it more effective at handling never-before-seen search queries.
Want to see complex queries in action? Go type a search into Google as you normally would. Now check the results. If you used an uncommon or unrelated set of terms, you will see Google throws up a kitchen sink of results for the unknown or unmapped items. Why is this?
Google is searching against items known to Google and using machine learning (RankBrain) to create/understand/infer relationships when they are not easily derived. Basically, when the entity or relationship is not known, Google is not able to infer context or meaning very well — so it guesses.
Even when the entity is known, an inability to determine relevance between the searched items decreases when relevance is not already known. Remember the searches where Google showed you the words it did not use in the search? It works like that, we just don’t see those removed search terms any more.
But don’t take my word for it.
We can see this in action if you type your query again — but as you type, look in the drop-down box and see what results appear. This time, instead of the query you originally searched for, pick one of the drop-down terms that most closely resembles your intent.
Notice how much more accurate the results are when you use Google’s words? Why? Google cannot understand language without knowing how the word is defined, and it cannot understand the relationship if not enough people have told it (or it does not previously know) the attributes are correlated.
These are how entities work in search in simplified terms.
Again, though, just what are entities?
Generally speaking, nouns — or Persons/Places/Ideas/Things — are what we call entities. Entities are known to Google, and their meaning is defined in the databases that Google references.
As we know, Google has become really excellent at telling you all about the weather, the movie, the restaurant and what the score of last night’s game happened to be. It can give you definitions and related terms and even act like a digital encyclopedia. It is great at pulling back data points based around entity understanding.
There in lies the rub. Things Google returns well are known and have known, mapped or inferred relationships. However, if the item is not easily mapped or the items are not mapped to each other, Google has difficulty in understanding the query. As mentioned previously, Google basically guesses what you meant.
Google now wants to transform words that appear on a page into entities that mean something and have related attributes. It’s what the human brain does naturally, but for computers, it’s known as Artificial Intelligence.
It’s a challenging task, but the work has already begun. Google is “building a huge, in-house understanding of what an entity is and a repository of what entities are in the world and what should you know about those entities,” said [Google software engineer Amit] Singhal.
So, How Does This Work?
To give an example, “Iced Tea,” “Lemons” and “Glass” are all entities (things), and these entities have a known relationship. This means that when you search for these items — [Iced Tea, Lemons, Glass] — Google can easily pull back many highly relevant results. Google “knows” what you want. The user intent is very clear.
- What if, however, I change the query to…
Iced Tea, Rooibos, Glass
Google still mostly understands this search, but it is not as clear an understanding.
Why? Rooibos is not commonly used for Iced Tea, even though it is a tea.
- Now, what if we change this query to…
Iced Tea, Goji, Glass
Now, Google is starting to throw in the kitchen sink. Some items are dead on. Some items are only relevant to goji tea, not iced tea.
Google is confused.
- Now, if I make a final change to…
Iced tea, Dissolved Sugar, Glass
Google loses almost any understanding of what this query set means. Although these are the ingredients in the recipe for sweet tea, you will see (amidst a few sweet tea recipes) some chemistry-related pages.
Why? Google does not know how to accurately map the relationship.
- But what if I look at the drop-down for other terms that mean the same to me as a human when Google can no longer determine these entities and their relationship? What if I search the drop-down suggested result?
Glass of Sugary Iced Tea
The only meaningful words changed were “sugar” to “sugary,” and the word “dissolved” was dropped. Yet, this leads us to a perfect set of Sweet Tea results.
What Google can do is understand that the entity Iced Tea is, in fact, a thing known as Iced Tea. It can tell that a Glass is indeed a Glass.
However, in last example, it does not know what to do with the modifier Dissolved in relation to Iced Tea, Sugar and Glass.
Since this query could refer to the sugar in Iced Tea or (in Google’s “mind”) a sugar solution used in a lab, it gives you results that have Iced Tea. It then gives you results that do not have Iced Tea in them but do have Dissolved Sugar. Then, you have some results with both items, but they’re not clearly related to making Iced Tea.
What we see are pages that are most likely the result of RankBrain trying to decipher intent. It tries to determine the relationship but has to return a kitchen sink of probable results because it is not sure of your intent.
So what we have now is a set of query terms that Google must assess against known “things” (entities). Then, the relationship between these things is analyzed against known relationships, at which time it hopes to have a clear understanding of your intent.
When it has a poor understanding of this intent, however, it may utilize RankBrain to list you the probable result set for your query. Simply put, when they cannot match intent to a result, they use a machine to help refine that query to probabilities.
So where is Google going?
While Google has been experimenting with RankBrain, they have lost market share — not a lot, but still, their US numbers are down. In fact, Google has lost approximately three percent of share since Hummingbird launched, so it seems these results were not received as more relevant or improved (and in some cases, you could say they are worse).
Google might have to decide whether it is an answer engine or a search engine, or maybe it will separate these and do both.
Unable to produce a semantic engine, Google built one based on facts. RankBrain has now been added to help refine search result because entity search requires not only understanding what the nouns in a search mean, but also how they are related.
Over time, RankBrain will get better. It will learn new entities and the likely relationships between them. It will present better results than it does today. However, they are running against a ticking clock known as user share.
Only time will tell, but that time is limited.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.