How Google uses machine learning in its search algorithms
Gary Illyes of Google tells us Google may use machine learning to aggregate signals together for better search quality, and with RankBrain.
One of the biggest buzzwords around Google and the overall technology market is machine learning. Google uses it with RankBrain for search and in other ways. We asked Gary Illyes from Google in part two of our interview how Google uses machine learning with search.
Illyes said that Google uses it mostly for “coming up with new signals and signal aggregations.” So they may look at two or more different existing non-machine-learning signals and see if adding machine learning to the aggregation of them can help improve search rankings and quality.
He also said, “RankBrain, where … which re-ranks based on based on historical signals,” is another way they use machine learning, and later explained how RankBrain works and that Penguin doesn’t really use machine learning.
Here is the audio file:
Here is the full transcript:
Danny Sullivan: These days it seems like it’s really cool for people to just say machine learning is being used in everything.
Gary Illyes: And then people freak out.
Danny Sullivan: Yeah. What is it, what are you doing with machine learning? Like, so when you say it’s not being used in the core algorithm. So no one’s getting fired. The machines haven’t taken over the algorithm, you guys are still using an algorithm. You still have people trying to figure out the best way to process signals, and then what do you do with the machine learning; is [it] part of that?
Gary Illyes: They are typically used for coming up with new signals and signal aggregations. So basically, let’s say that this is a random example and not know if this is real, but let’s say that I would want to see if combining PageRank with Panda and whatever else, I don’t know, token frequency.
If combining those three in some way would result in better ranking, and for that for example, we could easily use machine learning. And then create the new composite signal. That would be one example.
The other example would be RankBrain, where… which re-ranks based on based on historical signals.
But that also is, if you, if you think about it, it’s also a composite signal.
It’s using several signals to come up with a new multiplier for the results that are already ranked by the core algorithm.
Barry Schwartz: Didn’t you first use it as a query refinement? Right? That’s the main thing?
Gary Illyes: I don’t know that … ?
Barry Schwartz: Wasn’t RankBrain all about some type of query understanding and…
Gary Illyes: Well, making sure that for the query we are the best possible result, basically, it is re-ranking in a way.
Barry Schwartz: Danny, did you understand RankBrain to mean, maybe it was just me, to mean, alright someone searched for X, but RankBrain really makes [it] into Xish? And then the queries would be the results.
Danny Sullivan: When it first came out, my understanding was [that] RankBrain was being used for long-tail queries to correspond them to short short answers. So somebody comes along and says, Why is the tide super-high sometimes, when I don’t understand — the moon seemed to be very big, and that’s a very unusual query, right? And Google might be going, OK, there’s a lot going on here. How do unpack this and to where, and then getting the confidence and using typical things where you’d be like, OK, we’ll see if we have all these words you have a link to whatever. Meanwhile, really what the person is saying is why is the tide high when the moon is full. And that is a more common query. And Google probably has much more confidence in what it’s ranking when it deals with that, and my understanding [is that] RankBrain helped Google better understand that these longer queries coresponded basically to the shorter queries where it had a lot of confidence about the answers.
That was then, that was like what, a year ago or so? At this point, Gary, when you start talking that re-ranking, is that the kind of the re-ranking you’re talking about?
Gary Illyes: Yeah.
Danny Sullivan: OK.
Barry Schwartz: All right. So we shouldn’t be classifying all these things as RankBrain, or should we? Like it could be other machine learning.
Gary Illyes: RankBrain is one component in our ranking system. There are over 200, as we said in the beginning, signals that we use and what each of them might become like machine learning-based.
But when you or I don’t expect that any time soon or in the foreseeable future all of them would become machine learning based. Or that’s what we call the core algorithm would become machine learning-based. The main reason for that is that debugging machine learning decisions or AI decisions, if you want, if you like, is incredibly hard, especially when you have … multiple layers of neural networks. It becomes close to impossible to debug a decision. And that’s very bad for us. And for that we try to develop new ways to to track back decisions. But if it can easily obfuscate issues, and that would limit our with our ability to improve search in general.
Barry Schwartz: So when people say Penguin is now an old machine learning-based…
Gary Illyes: Penguin is not ML.
Barry Schwartz: OK, there’s a lot of people saying that Penguin [is] machine learning-based.
Gary Illyes: Of course they do. I mean if you think about it, it’s a very sexy word. Right. And if you publish it…
Danny Sullivan: People use it in bars and online all the time. Like hey, machine learning. Oh yeah.
Gary Illyes: But basically, if you publish an article with a title like machine learning is now in Penguin or Penguin generated by machine learning it’s like…. But if you publish an article with that title it’s much more likely that people could click on that title, and well, probably come up with the idea that you are insane or something like that. But it’s much more likely they would visit your site than if you publish something with a title Penguin has launched.
Note: This article was pre-written and scheduled to be published today.