Predicting the value of a search engine ranking signal
Contributor Dave Davies deconstructs a new Google patent that covers how machine learning can predict a ranking signal value when the value is unknown.
Google was recently granted a patent with a wide range of practical applications. The patent covers how, with machine learning, they can predict a ranking signal value when the value is unknown.
Given the vast amount of content on the internet and more coming daily, Google needs to find a way to assign value to pages even if they have not been crawled and indexed. How can a page be ranked if Google hasn’t crawled it? How can Google use a new piece of content that doesn’t have any inbound links?
The methods in this patent address how the Google algorithm may address and calculate unknown factors and use them to determine where a page ranks.
We’ll discuss the possible implementations Google may be using and a couple of the problems it solves for search engine optimization specialists (SEOs). But before we start, I feel obliged to offer my standard disclaimer.
Just because something is patented, it does not mean it is incorporated into an algorithm. We need to weigh the probabilities that the patent, or parts of it, are being used with what we see around us and what makes sense. If nothing else, it gives us a glimpse into what Google is working on.
Given the topic and methods outlined in this patent, I would say it’s highly likely that at least some iteration is in use and likely to be expanded on as machine learning systems evolve.
Let’s begin by digging into the nuts and bolts. If you’re interested in the source, you can find the full patent here, but I’ll be covering the applications from the patent, what they mean and how they can be used.
Let’s begin with an image from the patent that won’t make sense now but will assist in the explanations to come:
Take a look at items 150 and 160 in the image above. These two factors are important and that’s what we’ll be talking about, since machine learning is used to solve significant search issues SEOs have complained about for years.
While the system we’ll be discussing has a variety of applications, the patent outlines one core issue in section 0008:
The search system can update a search engine index that indexes resources with the generated values of the search engine ranking signals for the resources and the generated values can then be used by the search engine in ranking the resources. Thus, the completeness of the search engine index and, in turn, the accuracy and the efficiency of the search engine can be improved.
Basically, they have identified a significant problem: In the absence of a known ranking signal value, there isn’t a way to rank content, even if the content is best suited for a specific query.
When there are no links
Let’s consider the following simplistic calculation for links to a new piece of content:
Number of links (signal a) = unknown or unavailable
Relevance of content to “blue widgets” (signal b) = 9.8/10
Domain value passed / Internal PageRank (signal c) = 9.2/10
Based on the calculation, we know the relevance of the page, and we know the strength the domain is passing to the page; but without knowing the number of links or their weight, how can Google properly rank the page? How can Google rank any page if they don’t know how many or what type of inbound links a page has? Any formula or algorithm that uses link count as a multiplier will zero out.
With an unknown signal value, no calculation can ever be correct, and Google won’t be able to produce the best results. As SEOs, we have a similar problem: You can’t rank without links, and it’s hard to get links for content that doesn’t rank, even with the best content for the query.
The methods in this patent give the algorithm the capability of predicting a value until it is confirmed. This prediction factor might be the most exciting aspect, as it facilitates rapid testing and accelerates the deployment of machine-learned corrections.
While a variety of permutations are discussed in the patent, at its core it comes down to training a machine learning system to generate a likely value for a ranking signal when there isn’t one.
A tale of two indexes
The method outlined in the patent requires two indexes. These should not be confused with the search index we use every day. While the intent may be to apply this to the general index, prior to that Google would use two closed indexes, separate from the general search index.
For illustration purposes, we’ll call them index A and index B.
For index A, the ranking signals value is known and applied to train the algorithm in understanding its starting point. The algorithm has also been given pages and backlinks. Once the algorithm has been trained to understand how a web page is structured and has adapted to related elements like backlinks, a value is assigned, and signal values are then applied to the second index.
In index B, the signal values are known to the algorithm but are not incorporated into the machine learning system. Index B trains itself by learning where it gives the correct weighting of a factor and where it does not based on the information from index A.
It’s in the second index that things become more interesting, because the algorithm also considers additional queries that may apply to the ranking signals. When the algorithm in index B tries to predict a single result, it will probably always be off a bit, but when predicting many results, the predictions become more accurate. Because of the “wisdom of the crowd” phenomenon, index B is allowed to self-correct (that’s the machine learning element at play) and does so by incorporating the additional queries and what it’s learned.
If the system in index B can determine a signal value for a number of related queries, this may assist in generating the unknown value for the initial query.
Why is this important?
It’s always valuable to understand how search engines work, but more directly, it’s valuable to understand the system that will enable new sites and new resources to rank quickly.
The two-index system described above has encoders and decoders. The encoders visit a web page and create an encoded representation. While I obviously am not privy to exactly what this would look like on the back end, based on the multiple references to entities in the patent, it’s likely a mapping of the entities within the page and known relationships to other entities in the index or in other resources.
Google has been granted a patent that lets them rank new resources (pages) using likely ranking signals. This same patent will also facilitate the creation of new signals by other engineers or machine learning systems and allow the overall algorithm to rank pages that haven’t yet been assigned a value.
New content or resources can be assigned values based on links, user behavior metrics and content quality they are likely to get. Or basically, they’ve found a way to predict the search future.
Even more groundbreaking, however, is the fact that the system offers a method to give machine learning systems the ability to generate signals on their own. Humans no longer have to tell the algorithm what is important: Machine learning teaches the algorithm to find, identify and assign a value to signals.
How you can use this patent
While there is little you can directly do to influence machine learning, you can indirectly make a difference by continuing to produce great content and promoting the development of good links.
Look at the content on your site and figure out the types of content generating traffic and links as these are metrics Google can measure through its analytic and search console tools. IMO, these are signals a machine learning system would use.
If your current content is ranking well, generating links, clicks and shares, new content may be predicted to do the same.
Review your analytics and backlinks and make note of what you’re doing right, and let that inspire future content and link-building efforts. Conversely, take note of what didn’t go well. Just as the algorithm takes note of successes, it also takes note of failures. If the trend on your site is positive, you will likely be rewarded, and if it’s negative, then the opposite is may be true.
And if you don’t rank quickly, especially for time-sensitive content, you likely won’t get the signals you need to rank the next piece, either.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.