Since at least 2005, Google has been using a large, worldwide focus group to help review its search results and the quality of the web pages that rank well in its algorithm. The people in this program are called Quality Raters and, as you can imagine, the work they do is important to search marketers everywhere.
Google was actually advertising Quality Rater jobs in late 2004, but today the Quality Raters don’t actually work for Google; they work for contractors such as Lionbridge, Leapforce, Butler Hill and possibly others. According to Lionbridge’s Internet Assessors Program job page, it has more than 4,500 people around the world rating search results. Leapforce’s website doesn’t indicate how many are in its program, but the job listings page includes opportunities with names like “Search Engine Evaluator,” “Social Search Engine Evaluator” and “Search Quality Judge.”
The Quality Raters’ work has become more widely known over the years thanks to a couple occasions when the guideline document that Google provides as part of their work has been leaked online. (See our posts in March 2008 and October 2011.) Webmasters have also noticed unique quality rater referral strings, indicating when one of the evaluators had visited a website.
After Jennifer Ledbetter posted about the program last fall, one current Quality Rater contacted Search Engine Land wanting to explain and clarify some of what’s been written and said about the program. Since then, with a couple breaks for holidays, I’ve traded numerous emails with this person … who, in addition to working for Lionbridge as a Quality Rater, also happens to work for a US-based search marketing agency.
To help ensure that this person, whom I’ve never met, is actually a Quality Rater, I asked for some screenshots from inside the website where the rating work is done. A couple of those are inserted within the interview, and here’s an image of the rating tasks home page showing an empty task queue.
Below, we talk about the hiring process, what Quality Raters look for when they examine websites, details of the different evaluation tasks they do and much more.
Q&A With A Google Search Quality Rater
SEL: Tell me how, when and why you got started with the Quality Rater program.
Quality Rater: I first started with Lionbridge in May of 2011. I was looking for work because my then current employer had told me I was taking a pay cut, so I needed a way to add income. I began searching all the normal places for job listings and came across one on Craigslist for a Quality Rater. It sounded cool, so I sent them my resume and they got back to me the next day saying they were excited to have me and if I could just pass a few simple tests I would be hired. That was the easy part.
Did the job listing specifically mention Google?
The listing didn’t mention anything about Google but as soon as they contacted me, they said I would be doing work related to Google.
So, you knew it was Google-related. At what point did you know that you’d be rating Google’s search results?
I knew before I got hired.
One thing I think the SEO community is missing is that this program has nothing to do with SEO or rankings. What this program does is help Google refine their algorithm. For example, the Side-by-Side tasks show the results as they are next to the results with the new algorithm change in them. Google doesn’t hire these raters to rate the web; they hire them to rate how they are doing in matching users queries with the best source of information.
Let’s talk about the hiring process. There’s some kind of test. Was it difficult?
I had six days to complete both parts of the test, with the second part opening after I passed the first test.
The tests turned out to be a 24-question, essay-response theoretical test that asked questions based on a PDF they had sent me. The questions were designed to test my ability to take the rules and apply them to situations that weren’t covered in the PDF. One that I vaguely remember was about spam and what to do if the site didn’t show any signs of spam, but it gave off a spammy feeling. It was the hardest test I have ever taken (for a reference point, I’m a Literature major who has taken graduate-level courses).
Only after having passed that test did I get to take the practical exam, which had more than 140 questions. This test had actual results that I had to rate. In order to be hired, I needed to score a 90% or higher in each of the four categories (which were Vital, Useful, Relevant and Off-Topic or Useless). Ideally, these represented the actual tasks that I would receive as a rater.
What were the questions like?
To give you example of questions asked:
Query [crispy cream], English (US)
It would then be up to me to visit the page — something that I want to stress, because blogs out there have been saying that a rater can rate the page without visiting it — decide if it fits the query and then assign a rating. It really is up to the rater, but the correct answer here is Useful because of the spelling. If the user had typed “Krispy Kreme,” than this result would be off-topic, but because it is “crispy cream,” and the guitars on this page are called Crispy Cream, this could be the page the user is wanting.
There were 143 just like that. It was good times.
Do you have any direct contact with anyone at Google, or do you only communicate with Lionbridge?
I have no contact with Google; it’s only Lionbridge.
After you get hired, is there some kind of training?
After I got hired there was a weekly, two-hour webinar along with training modules to complete. It was very intense training. During the first four weeks, I was required to comment on every rating I gave. These comments were then reviewed and commented on, giving me feedback on my ratings.
At what point do you get the raters’ handbook?
I got this the moment I got hired. It basically is just a list of tasks we perform along with examples of how to rate them.
How does Lionbridge (or Google) describe the handbook?
They refer to it as the guidelines, not a handbook.
While we are on the subject of guidelines, one thing that really impressed me was how they have more than one rater looking at a site. I believe (I’m not sure, I’m going off the comments left by other raters) that there are about six raters looking at each task. If I rate something as useful but another rater says it’s off-topic, we must come to an agreement (through comments and debate) before the rating is submitted.
How much do you make and how often do you get paid?
I get paid $14.50/hour and I am paid once a month. I’m only able to work a max of 20 hours a week and a total max of 80 hours a month.
In one of the recent articles about the Quality Raters, it says you can only work for a year and then you have to wait three months before you can re-apply. Is that true?
I know they say you can only be a rater for a year, but everyone I’ve talked to says that, as long as they get their hours in and keep up the quality they are allowed to rate.
Is the schedule completely up to you, or do they give you assigned hours?
I schedule my own hours; as long as I get at least 10 but no more than 20, I stay on pretty good terms with them. They are very strict, but allow you to make up hours that you missed. So, if I only did four hours the first week, I could make up the hours by doing 16 hours the next week. Still only allowed 20 hours a week max, so if I miss more hours than I can make up, I’m out of luck.
They also tend to be really strict about their productivity goals. There is a certain number of tasks that I must complete every minute, depending on the task type. If I fall short of those goals, I am put on probation, during which I can not work. If my quality isn’t up to par, they fire me. It’s a very controlled work environment.
You mentioned there about getting fired “if my quality isn’t up to par.” How do you know if you’re doing a good job? It seems to me that in a lot of cases, rating search results is pretty subjective.
Results are subjective, but they have a quality center that shows your progress over time. They track how many returned results you have, how long it takes you to take care of a troubled rating, etc. While the rating is up to me, it has to be similar to what other raters have said. So, they track quality based on staying within the time period for rating tasks and the number of tasks you have returned to you.
They return tasks to you — what does that mean?
It means that there has been a disagreement on the rating and you have to go back in and come to an agreement with the other raters.
So, the rating of search results is a group project. Is it difficult to come to agreement?
Sometimes it’s harder to agree with raters, especially if they haven’t read the guideline like they should or if they are just starting out. However, after enough exchanges, they have a moderator come in and choose which rating matches it best. This moderator looks at our comments and makes a decision off of that.
How often does that happen in your experience?
Not very often. Most of the time if you give your reasoning for why you rated something one way, the other raters will agree with you. Most of the time, these types of disagreements occur when something is either slightly relevant or off-topic. Once in a while, someone will think that a page is spam that isn’t, or the other way around. I’ve only had a moderator step in once.
What do you know about the moderators? Are they Lionbridge employees?
Yes, they work for Lionbridge. From what I know of them, they used to be raters and then got promoted.
Do you only look at organic results, or are you also grading ads/PPC landing pages?
We look at any type of page on the web. Most of them are organic results, but some of the tasks are geared towards more ad-related topics.
Do you remember an example of an ad-related task?
Not really. Most of what they were was placement on the page, order in which they are presented and which one would I click, etc.
Do you look at Google Places results and other Universal results, like News or Videos?
Yes, we do. I can think of many tasks where it shows the map of what a user was looking at before they typed in a query, and we are then to rate the results of that query based on the map they were looking at. We also rated news based on how current it was, how relevant it was to the query, and if it came from a trustworthy source. As for videos, we had to watch the video to determine if it was a match for the query and rate it Useful, Relevant, Slightly Relevant, or Off-topic.
That part about Maps is really interesting. So, in that task, they were putting you in the middle of some process — you’re not just doing tasks that involve standalone searches, but sometimes taking into account what has happened before? Does that also happen with other searches, too?
Almost all of the tasks given have to do with user experience. Even with just the basic searches, we are given the user’s language and location before we can rate a page. It’s not about if a page fits a query, it’s about if a user would find the page useful. The Maps queries (called local queries) are the only ones that give what the user was looking at before searching, but we are supposed to keep in mind what a user is expecting to see from that query with every task type. For example, if someone was in Seattle and typed in the query “weather,” they would find a page showing the weather in Florida slightly relevant; however, someone in Tampa would find it useful.
Aside from the collective rating that you described above, do you ever have other communication with other raters? Are there official or unofficial places where you can chat back and forth?
There are lots of places — forums and such on the Lionbridge site — where raters can talk to each other, but I never interact with them. I was always stressed getting my hours in for the week, so I didn’t have time to mingle.
Can you share a specific example of one of your recent tasks?
I can’t think of the exact URLs I rated, but the keyword was “Nike Women’s Running Shoes.” It gave me a list of 20 URLs to rate (10 on each side) [Ed. note: he's referring to the "Side-by-Side" tasks mentioned earlier.] and I visited each one in order to determine whether they were vital, useful, relevant, slightly relevant, or useless. With a recognized brand name like that, it wasn’t hard to determine quality. For example, I think the Nike site was one of the options, so that would get a “vital” rating. I remember a couple of sites sold the shoes, so I gave them a “useful” rating and the Wikipedia entry on Nike was giving a rating of “slightly relevant” because I believe not many people searching for Nike Women’s Running Shoes want a history of the company.
Do you click through and review all ten results that show up for a given task?
I always click all the links simply because I’m not good enough to tell what the site is about by just reading its description. No one is good enough, that’s why they give us the links.
When you click through from a Google search result page, what are you looking for on the web page that you visit?
When looking at a site, I always check for spam signals first — keyword stuffing, hidden text, sneaky redirects, and the like. Once I know it’s a good site, I start to look at the page as a person who would type the query in Google and whether or not the content on the page would help me fulfill my needs. There are some tasks that ask about design and layout and the like, but for the normal URL rating or Side-by-Side tasks, I really just look at content and figure out if it would be a worthwhile page for a user to see.
Do you ever look at the source code or anything like that? Are Raters asked or trained to look at source code of the web pages being rated?
There is a quick primer on looking at the source code in the guidelines, nothing in depth. Basically we look for hidden keywords and other spammy tactics discussed in the guidelines.
You mentioned URL rating tasks and Side-by-Side tasks, but also some that involve design and layout. What are those tasks like?
Design tasks ask if the page has a good ratio of main content, supplemental content, and ads. It also asks about the overall design, is it easy to read, clear communication of information, and the like. It’s not about whether the page is beautiful or amazing, but whether or not the normal user could find what they need on the page without getting lost.
Do they give you a single web page and ask you to rate its design, or are you still going through a page of search results and then rating design?
They are specific tasks, not part of rating a URL.
Are spelling and grammar part of the design-based tasks?
Spelling and grammar are something we look at in all tasks (at least I do) but there’s not a ding for it.
When looking at design and layout, do your criteria change based on the type of site you’re looking at? For example, a web page on a big brand site might be expected to have a more professional design than some small business sites.
Like I said before, it’s more about the layout than the actual design. A company with a simple design would be rated just as well as a big company with a professional design as long as the information is clear and presented in a way that is easy to understand. To give you an example, a page where you can tell what the main content is with ads taking second page in the design would get a high rating. A page where the ads are confused with the main content, where you can’t tell the difference between content and ads would get a low rating.
How many different kinds of tasks are there? The guidelines I’ve seen begin by saying “you will work on many different types of rating projects.”
There are a lot of different tasks but they are all grouped under four main groups: URL, Side-by-Side, Experimental, and Result Review. The big one there is the Experimental tasks which have a ton of different types of tasks in them. I’ve included a picture that lists all the task types and how long they are supposed to take, as well.
What are “Display Block” and “TTR” tasks?
Display Block, if I remember right, is a block of images that we rate as a whole rather than one at a time. TTR stands for Time to Rate, which is the baseline task they use to determine how long it should take to get a task done. It has all the different tasks in it, but instead of looking for accuracy it just cares about time.
Do they try to give you tasks related to topics and things you know about, or do you review pages about things you’re not very familiar with?
If someone types in “Best Dog Food for Puppies,” it’s not very hard to know what they are wanting and most queries have a fairly obvious meaning. However, once in a while I’ll get one that I can’t figure out and that’s when I do research to figure out what they want. For example, if someone queried “Release Liner,” I would need to do some research to figure out that it’s something used in cutting vinyl for signs and the like. At that point, I could determine whether a site is worthwhile or not. Granted, it’s not a perfect system but it works most of the time.
Are there specific industries/niches that show up more than others in your rating tasks?
Not that I have noticed.
How does your work affect Google’s search results — do they tell you anything about that?
They don’t talk about that; however, I know that what it really does is perfect the algorithm instead of changing actual live search results. I gathered this from the way that Side-by-Side are the most important tasks because they show the old algorithm versus a change in the algorithm that they are testing.
Are you an active Rater these days? How long do you think you’ll keep doing it?
I still rate on the weekends. I like doing it, so I’ll keep doing as long as I can.
Does Lionbridge and/or Google know that you work in the search marketing industry?
No. I got this job after I got the Lionbridge job.
Do you know of any other search marketers who are also Quality Raters?
I don’t know any personally, but I bet there aren’t a lot of us.
What’s your opinion of Google’s search results, and has that opinion changed since you became a Quality Rater?
I’ve always used Google as my “go to” search engine; however, since I became a rater, I’ve started using it more because I can see the behind-the-scenes improvements they are trying to make.
I like the idea that they have an army of actual people working towards bettering their engine. I know some people might think this wrong or even that raters have a negative effect on their rankings. Well, I can honestly say that they don’t. The whole point behind quality raters is not to rate the actual web, but rather rate how well Google is doing at providing quality results.
Almost every company has some form of quality control. Do people get upset that McDonald’s has someone check the quality of their food? I don’t see what Google does as any different than wanting to present the best possible product they can to their users.
So, to answer your question, yes, my opinion has changed for the better.