Google Gutted Its Search Quality Rating Guidelines For Public Release
As part of today’s big “How Search Works” reveal, Google also took the big step of sharing its Search Quality Rating Guidelines for the first time. This is the document that Google’s human search quality raters use when grading Google’s search results. But the new, public document is actually an edited version of the old […]
As part of today’s big “How Search Works” reveal, Google also took the big step of sharing its Search Quality Rating Guidelines for the first time.
This is the document that Google’s human search quality raters use when grading Google’s search results.
But the new, public document is actually an edited version of the old one that circulated quietly several times amongst webmasters and SEOs. In fact, “gutted” is more accurate than “edited” — where the most recent non-public version of the document was 161 pages, the public document released today is only 43 pages.
What’s Changed
The biggest change, in my opinion, is the complete removal of Parts 3 and 4 — “Page Quality Rating Guidelines” and “Rating Examples.” These sections offered extremely detailed guidance on how to rate pages, how to rate sections of pages, how to judge the reputation of a website, and specific examples of public web pages and how they should be rated. I’ll share more on this below.
Many sections, in fact, are now missing the specific URL examples that were included in the old document — URLs that matched the different labels of Google’s rating scale, for example (“vital,” “relevant,” “useless,” etc.) It’s likely that Google doesn’t want all those specific examples of pages and how they’re rated to be public. It’s also possible that Google decided web pages/URLs change regularly and it’s not efficient to have to manage a public list of URLs in that setting.
A substantial amount of material related to local search and location-based queries was removed.
There are also some sections that were expanded to add clarification — a section on “thin affiliates” is one example.
I’ve spent most of the morning comparing the most recent “underground” version of the Search Quality Rating Guidelines with the new document made public today, side-by-side, listing all the relevant (and some you might say aren’t relevant!) changes. Who loves ya, baby?
Ready? Here we go….
34 (At Least!) Changes To Google’s Search Quality Rating Guidelines
1) The old document was version 3.27. The new document is 1.0.
2) Google added this preface to the new document:
“Google relies on raters, working in countries and languages around the world, to help us measure the quality of our search results, ranking, and search experience. These raters perform a variety of different kinds of ‘rating tasks’ designed to give us information about the quality of different kinds of results in response to different kinds of queries. The data they generate is rolled up statistically to give us within the Google search team a view of the quality of our search results and search experience over time, as well as an ability to measure the effect of proposed changes to Google’s search algorithms. Raters’ judgments do not directly impact Google’s search result rankings. While a rater may give a particular URL a score, that score does not directly increase or decrease a given website’s ranking. Instead these scores are used in aggregate to evaluate search quality and make decisions about changes.
This document is a ‘Cliff’s Notes’ version of our search quality rating guidelines. By this, we mean that it is not the entire version that raters use on a daily basis; however, it is a summary of the important topics. The raters’ version includes instruction on using the rating interface, additional rating examples, etc. These guidelines are used as rating specifications for search raters, and this document in particular focuses on a core type of rating task called ‘URL rating.’ In a URL rating task, a rater is shown a search query from their locale (country + language) and a URL that could be returned by a search engine for that query. The raters ‘rate’ the quality of that result for that query, on a scale described within the document. Sounds simple, right? As you’ll see, there are many cases to think through, and this document is used to guide raters on some of those cases and how to look at them.
Our search quality rating guidelines are in constant flux as we learn and search evolves over time. We’ve created this version especially for those individuals who want to understand better how Google thinks about relevance and quality of search results.”
3) Section 1.3 – “The Purpose of Search Quality Rating”
Old text: “Your ratings will be used to evaluate search engine quality around the world. Good search engines give results that are helpful for users in their specific language and location.”
New text (change in bold): “Your ratings will be used to evaluate search engine quality around the world. Good search engines give results that are helpful for users in their specific language and location. Please note that your ratings do not directly impact Google’s search result rankings or ranking algorithms.”
That’s something that Google has emphasized in the past, and obviously has reason to emphasize again.
3) Google removed Section 1.6 (“Releasing Tasks”).
This section explained that it’s okay for raters to skip (release) some rating tasks, and described some cases when that’s acceptable, i.e., “You believe that the landing page will be offensive to you.”
4) Section 2.4.1 (“Action Queries – Do”) and 2.4.2 (“Information Queries – Know”)
In these sections, Google edited the charts showing the relationship between a query and a landing page; example URLs of helpful pages were removed, as shown below.
OLD
NEW
The example URLs were left in Section 2.4.3 (Navigation Queries – “Go”) and Section 2.4.4 (Queries with Multiple User Intents (Do-Know-Go)).
5) In Section 3.0, “The Language of the Landing Page,” a chart showing examples of landing page language flags was removed.
6) In Section 4.1, “Vital,” which is about the highest rating that a page can receive, Google removed a paragraph directing the reader to a later part of the document.
7) Sections 4.1.1 through 4.1.3 were removed. These were titled:
- 4.1.1 – Examples of English (US) Navigation Queries with Vital Pages for the Task Location
- 4.1.2 – Examples of Entity Queries with Vital Pages
- 4.1.3 Vital Pages for People Queries
Each section included a chart listing examples of “vital” web pages for different types of queries.
8) Section 4.1.4 (“Other Important Vital Concepts”) was changed to become Section 4.1.1 (“Important Vital Concepts”) and was edited substantially. A chart showing examples of queries that don’t have vital pages was left in, but two charts showing examples of vital pages were removed — one chart showed how there may be more than vital page for some queries, and the other showed examples of unofficial website that look official and should not be considered vital.
9) Section 4.1.5 (“Vital Pages and Geographic Location”) was removed completely. This section offered guidance for dealing with websites and web pages that have multiple versions for different languages.
10) Also removed completely are:
- Section 4.2.1 (“Examples of Useful Pages”)
- Section 4.3.1 (“Examples of Relevant Pages”)
- Section 4.4.1 (“Examples of Slightly Relevant Pages”)
- 4.5.1 (“Examples of Off-Topic or Useless Pages”)
Each of these sections showed specific web pages matching the labels from Google’s rating scale. The company clearly doesn’t want the general public — and the companies whose pages were used as examples — to see specific examples of how Google assesses their pages. And, it’s also likely that these examples were removed because web pages change and some of the ratings may not have been accurate.
One interesting example is this paragraph from Section 4.4.1 which is now gone:
Please note that not all pages with copied content are considered “low quality”. The website www.answers.com contains content copied from Wikipedia.org and other dictionary and encyclopedia sites, but is not considered to be a low quality site because the content is well-organized and intended to be helpful for users. Similarly, there are pages on medical information sites that contain copied content. If the page is well-organized and appears to be designed to be helpful for users and not just to display ads for users to click on, it should be rated based on how helpful the content would be for users.
I’m guessing that Google wouldn’t want an endorsement of Answers.com like that to be widely distributed.
11) In 4.5 (“Off-Topic or Useless”), this paragraph was added to the new document:
A rating of Off-Topic or Useless also applies when there is lack of attention to an aspect of the query that is important for satisfying user intent.
12) In Section 4.6.1 (“Unratable: Didn’t Load”), some text describing a military site that triggers a message saying its security certificate isn’t trusted was removed, and a chart showing similar examples was edited to remove the URLs of sample pages that deserve the “didn’t load” rating. Another chart showing similar web page-generated messages was removed.
13) Charts with example URLs were removed from
- Section 5.1 (“User Intent and Page Utility”).
- Section 5.2 (“Location is Important”)
- Section 5.3 (“Language is Important (This section is for Non-English Task Languages)”)
- Section 5.5 (“Specificity of Queries and Landing Pages”)
14) In the “Common Rating Problems” section of the document, Section 5.6.1 (“Dictionary or Encyclopedia Results”) was changed as you see here:
OLD
NEW
15) Section 5.6.2 (“Action vs. Information Intent”) was removed completely from the new document.
16) Section 5.6.3 (“Queries that Ask for a List”) became 5.6.2, and a chart with example URLs was removed. Ditto for Section 5.6.4 (“Misspelled and Mistyped Queries”).
17) The very detailed Section 5.6.5 (“URL Queries”) was renamed to 5.6.4, but remains very detailed. Some example URLs were changed, and two charts with example URLs were removed.
18) Section 5.6.6 (“New and Old Pages”) becomes 5.6.5 and two charts with example URLs were removed.
19) Section 5.6.7 (“Search Engine Result Pages”) was removed entirely, and probably should’ve been a long time ago. This section tells raters that they should rate search engine result pages “just like any other landing page” … even though Google has said for years that it doesn’t want its own search results to include search result pages.
20) Section 5.6.8 (“Video Landing Pages”) was also removed completely.
21) In the “Flags” section of the document, these sections were edited substantially, usually to remove charts with example URLs:
- Section 6.2.1 (“Clear Non-Porn Intent”)
- Section 6.2.2 (“Possible Porn Intent”)
- Section 6.2.2 (“Clear Porn Intent”)
22) Section 6.2.4 (“Reporting Illegal Images”), which discusses child pornography and bestiality and has specific reporting instructions for both Leapforce and Lionbridge employees (the two companies that hire and manage the Quality Raters), was removed.
23) There are significant changes to “Part 2: URL Rating Tasks with User Locations”, where several sections were removed completely:
- Section 1.0 (“Important Definitions”)
- Section 1.1 (“What is the User Location?”)
- Section 1.2 (“Why are the Task Location and User Location important?”)
- Section 1.3 (“User Location, Task Location, and Explicit Location in the query”)
All of those were replaced with a Section 1.0 called “Query Locations” that’s about half as long as the old material.
24) Section 2.0 (“Location-Specific Rating Task Screenshot”) was changed — a chart was moved to Section 1.0 (“Query Locations”).
25) Another major change in this section is the consolidation of 11 pages of local query information down to two. These sections are gone and/or were condensed:
- Section 3.0 (“The Role of User Location in Understanding Query Interpretation/User Intent”)
- Section 3.1 (“Queries with Local Intent”)
- Section 3.2 (“Rating Landing Pages when the task has a User Location”)
Unchanged - Section 3.3 (“Vital Ratings for Rating Tasks with User Locations”)
- Section 3.4 (“Rating Examples”)
Those are replaced by two sections:
- Section 3.0 (“Assigning a Rating When There is a Query Location”)
- Section 3.1 (“When Does the Query Location Matter?”)
26) Parts 3 and 4 of the old document are completely missing from the new one. Part 3 was called “Page Quality Rating Guidelines” and Part 4 was “Rating Examples.”
They totaled about 50 pages of very detailed instructions and guidelines about identifying the quality of web pages, identifying “main content” and “supplemental content” on a page, how to rate the layout of a page, how to identify the reputation of a website, determining if content is high, medium, or low quality … and so forth.
Some of the material in these two sections was very “inside baseball” and not stuff that the general public would care about, but certainly anyone creating web content would.
27) “Webspam Guidelines” used to be Part 5 of the document, but becomes Part 3 with the removal of the two sections described above.
Some text has been edited to educate the reader about things like PPC ads, and some text has been removed — including a section (2.0) on browser requirements that human raters have to follow, text that linked to screenshot examples showing raters how to check for spam (i.e., “use Ctrl-A to reveal hidden text”), and text-based instructions for doing this like disabling javascript.
28) In the new document, Section 2.2 (“Keyword Stuffing”), these two bullet items were added to the list of examples:
- Pages with a large amount of what look like gibberish or random keywords.
- Pages that appear to be programmatically or automatically generated text that doesn’t really make sense.
29) This sentence was added to Section 2.4 (“Cloaking”): “True cloaking is somewhat rare, but spammers do use other methods to show different pages to search engines than to users.”
30) Section 3.1.5 (“Copied Message Boards”) now mentions “copies of Usenet posts” as an example of spam.
31) In the old document, a section on “Recognizing Copied Content” was broken up in the new document to two sections, but the content is the same.
32) The definition of “thin affiliate” has been clarified in the new document; the sentence in bold is new:
A thin affiliate is a site that offers little additional information and does not offer substantial value to users compared to many other sources on the Web. For example, an affiliate that has only copied content from the merchant site is considered a thin affiliate. This is a moneymaking spam technique.
Also in the new document, these two bullet points describing thin affiliates have been removed:
- Click buttons on the page. Click on a “More Information” or “Make a Purchase” button. If you are taken to a merchant on a different domain, it is probably a thin affiliate. You will not be able to make the purchase on the affiliate webpage.
- Check properties of images on the page. Right-click on an image on the page with your mouse and look at “Properties” to see where the image originates. Check to see if the address of the image is the same as the address of the page or if it is the address of a “real” merchant.
33) The instructions for identifying “Pages with Unhelpful Content and PPC Ads” have been expanded. Google says raters should ask themselves these questions:
- Is the content likely to be helpful to users or is it too general, too poorly written, or gibberish?
- Does the page provide substantial value when compared to other pages in search results?
- Does the page have an excessive amount of ads that distract from or interfere with the main content?
- Would you trust the content?
- Would you be comfortable giving your credit card information to the site?
34) In the new document, “Part 6: Using EWOQ” is removed — that has instructions for using the evaluation system where tasks are performed. Also gone is “Part 7: Quick Guide to URL Rating” and “Part 8: Quick Guide to Webspam Recognition.”
Conclusion
I think it’s great that Google has publicly released this Search Quality Rating Guidelines document. But the changes above strongly suggest that this is not the document that its hired human raters will be using, but more of a watered-down, public-friendly version.
Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.
Related stories
New on Search Engine Land