A Deeper Look At Google’s Search Quality Efforts
Udi Manber, VP Engineering, Search Quality at Google, wrote a fairly long post at the Google Blog on introducing search quality to the public. In short, Udi explains the various teams within the search quality group and explains what each team does. I will highlight below the parts I personally found most interesting:
- Google is secretive about their algorithms for competitive reasons and for search spam reasons.
- PageRank is still used in the algorithm, but not as much as it once was.
- Other algorithm components include, “language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it’s not just the language, it’s how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing).”
- Google conducts evaluations typically in three manners, (1) automated evaluations every minute, (2) periodic evaluations of our overall quality, and (3) evaluations of specific algorithmic improvements.
- All new algorithmic ideas are tested “thoroughly” by a team of statisticians reviewing tons of data.
- In 2007, Google released “450 new improvements, about 9 per week on the average,” to the algorithm alone.
- In January, Google released a major update to the PageRank algorithm (maybe this one, which caused a lot of discussion).
- In the past two years, Google has focused a lot on International search.
- A team is devoted to UI things, including improvement of help documents, plus is responsible for Universal Search, Google Notebook, Custom Search Engines, and iGoogle.
- Plus, Google has the web spam team (hi Matt), who works hand in hand with the Webmaster Central team
The most important point in Udi’s post is where he said, “this blog post is part of a renewed effort to open up a bit more than we have in the past,” and that “more blog posts will follow.” A more open and transparent Google is a great thing.