How Google will shift resources to media search and other tidbits from Gary Illyes’ AMA on Reddit
The Google webmaster trends analyst also touched on topics like the possibility of an Indexing API, whether internal over-linking penalties exist and how the clustering of duplicate pages works.
Google webmaster trends analyst Gary Illyes, who has been with Google for over eight years working exclusively on search-related topics, participated in an AMA (ask me anything) on Reddit over the weekend. Over the course of the lengthy discussion thread, he covered everything from robots.txt to RankBrain to behavioral signals to image and video search. We’ll save you the trouble of sifting through it all by sharing all the important stuff here.
Gary Illyes has done many AMAs with us at our SMX events in the past but we believe this is Gary’s first Reddit AMA.
Google respects the robots.txt. No matter what, Googlebot will respect the directives you give it within your robots.txt. If you give Google conflicting directives or you give invalid information in your robots.txt, then Googlebot can do its own thing. But otherwise, if you use the robots.txt correctly, Google will obey.
Gary wrote: “Robots.txt is respected for what it’s meant to do. Period. There’s no such thing as ‘sometimes can be ignored’.”
ccTLDs, gTLDs, Search Console setting impact on rankings. Gary said that these settings have an indirect impact on rankings. When it comes to local intent queries, if Google thinks local content is more relevant for the user, Google may rank content within that country higher than other content. These country signals, the ccTLD and/or country setting in Search Console, give Google information that helps it determine this domain is more relevant to people within a specific country.
“The way it affects its ranks is indirect I think. You have lots of gTLDs that are targeted to U.S. in the result set where your .lk domain tries to show up — those results are relevant and, on top of that, they get a slight boost for being “local” (i.e. targeted through search console),” Gary wrote. “Because you can’t get that boost [solely] with your domain for anything other than Sri Lanka, you are starting from a ‘penalty position’ (in the sense of sports).”
RankBrain explanation. RankBrain is Google’s AI-based query interpretation system that helps Google understand the query better and thus rank more relevant pages for that query. We covered it in detail in our FAQs before but here is how Gary Illyes explained it in the AMA:
“RankBrain is a PR-sexy machine-learning ranking component that uses historical search data to predict what would a user most likely click on for a previously unseen query,” he said. “It is a really cool piece of engineering that saved our butts countless times whenever traditional algos were like, e.g. ‘oh look a “not” in the query string! let’s ignore the hell out of it!’, but it’s generally just relying on (sometimes) months-old data about what happened on the results page itself, not on the landing page.”
So it sounds like RankBrain is both useful to Google and can also not work so well for Google because of the older data.
UX and behavior signals. One of the more controversial topics around Google ranking factors is whether and how much the search engine uses UX and behavioral signals for ranking. Google has forever denied using them for direct ranking signals. In the AMA. Gary said once again that Google doesn’t use them, getting in a dig at one personality in the industry who has claimed the opposite many times over the years.
“Dwell time, CTR, whatever Fishkin’s new theory is, those are generally made up crap. Search is much more simple than people think,” Gary said.
Raters and live tests. Gary then delved into how Google does use click data and other user data — not for direct ranking signals but for evaluating the search results. He talked about the search quality raters and how they grade the Google search results and he talked about live experiments — when Google tests how different scenarios impact searcher behavior. But the core rankings do not get directly influenced by this searcher behavior.
“When we want to launch a new algorithm or an update to the ‘core’ one, we need to test it,” he said. “Same goes for UX features, like changing the color of the green links. For the former, we have two ways to test: (1) With raters, which is detailed painfully in the raters guidelines, (2) With live experiments.”
“1 was already chewed to [the] bone and it’s not relevant here anyway,” Gary added. “2 is when we take a subset of users and force the experiment, ranking and/or UX, on them. Let’s say 1% of users get the update or launch candidate, the rest gets the currently deployed one (base). We run the experiment for some time, sometimes weeks, and then we compare some metrics between the experiment and the base. One of the metrics is how clicks on results differ between the two.”
Writing content using machine learning. Typically Google frowns upon having machines and computers write content. In fact, their guidelines have told webmasters to block search engines from indexing auto-generated content. But with machine learning and artificial intelligence, maybe that technology can make your content look even better than human-written content. If so, would Google be okay with that?
Gary implied that the answer was yes, saying, “If you can generate content that’s indistinguishable from that of a human’s, go for it. I’m actually planning to write something up on how to use ML and NLP for SEO (when I have some time).”
Image and video search. Google is also shifting more resources towards image and video search. Gary said he thinks there is a lot of opportunity for SEOs and marketers in this vertical. He won’t pre-announce anything but he said: “We simply know that media search is way too ignored for what it’s capable doing for publishers, so we’re throwing more engineers at it as well as more outreach.”
He repeated this again in the AMA, saying, “I cannot pre-announce things here, but yes, media search, in general, is something we’re throwing more engineering resources at nowadays. Google Images and Video search is often overlooked, but they have massive potential.”
Hreflang as a ranking benefit. Google has said time and time again that using hreflang markup for your websites does not result in a direct ranking benefit. It just communicates to Google more details about the origin and language of the page.
Gary gave a good example of why this is the case, saying, “This is an interesting question and I think the confusion is more about the internal vs external perception of what’s a ‘ranking benefit’. You will NOT receive a ranking benefit per se, at least not in the internal sense of the term. What you will receive is more targeted traffic. Let me give you an example:
Query: “AmPath” (so we don’t use a real company name ::eyeroll:: )
User country and location: es-ES
Your site has page A for that term in EN and page B in ES, with hreflang link between them.”
“In this case,” he went on, “at least when I (re)implemented hreflang, what would happen is that when we see the query, we’d retrieve A because, let’s say, it has stronger signals, but we see that it has a sibling page B in ES that would be better for that user, so we do a second pass retrieval and present the user page B instead of A, at the location (rank?) of A.”
Duplicate content clusters. A participant asked Gary an interesting question around page signal consolidation when Google clusters a bunch of pages because of duplication or because they are syndicated. The SEO asked if Google will pass the signals from the other pages in the cluster to the page Google ranks. Gary said he can’t give too much detail around the answer because spammers may try to abuse it but he said, “generally the page in a dup cluster that shows up in the results will see more benefits.”
Here is the full question and answer, because you need to see the exact context here:
Question: “If I syndicate content and the other site has a canonical back or it gets treated as the same page as the original because it is duplicate, do the signals like internal/external links to the content on the other site count for me which seems to be how it would be handled or is this a special case Google looks for and says hey that’s kind of fishy and likely paid, so don’t count those?”
Gary’s answer: “I can’t give a concrete answer here because spammers would have a field day with it, but generally the page in a dup cluster that shows up in the results will see more benefits.”
Folder level signals. Does Google have any sort of folder-level signals around content? Gary Illyes explained factors like this are likely more useful when it comes to crawling the content on the site.
“They’re more like crawling patterns in most cases, but they can become their own site ‘chunk’,” he said. “Like, if you have a hosting platform that has URL structures example.com/username/blog, then we’d eventually chunk this site into lots of mini sites that live under example.com.”
Point of no return. Gary was asked if a domain name can be so damaged that it is not repairable in terms of it being able to rank well again. Gary said no, there is no such thing. But Google has said in the past that some domains are damaged so badly that it may be easier to start with a new domain name.
Internal link penalty. Gary also said that there is no such thing as a penalty for internal link over-optimization. He said, “you can abuse your internal links as much as you want AFAIK.”
Focus on the basics. Gary really stressed that SEOs and webmasters should stop obsessing about such detail and focus on the basics, calling things like rank checkers and “silly updates” a waste of time.
“I really wish SEOs went back to the basics (i.e. MAKE THAT DAMN SITE CRAWLABLE),” he said, “instead of focusing on silly updates and made-up terms by the rank trackers, and that they talked more with the developers of the website once done with the first part of this sentence.”
List ranking factors. He was asked to list additional ranking factors beyond relevance, freshness, popularity. Gary listed: “Country the site is local to, RankBrain, PageRank/links, language, pornyness, etc.”
Indexing API. Google’s indexing API has recently attracted more attention because Bing announced their new API for submitting content and Yoast implied that they won’t only work with Bing’s API but also will be working with a Google API of some sorts. Gary acknowledged that they are doing some tests but said he believes Yoast pre-announced something. However, he did explain that Wix does have access to some sort of API for content submission. “As far as I know, yes, they’re the only early testers,” he said. “Though they’re also making stupid ass statements about it…”
In the context of Wix, he said, “Currently we’re testing our own limitations with the indexing API as well as the usefulness of pushed content vs pulled. We don’t really have anything to announce just yet…. As for Yoast, I don’t want to step on anyone’s toes (e.g. our Indexing API PM), but it may be that he over-announced some things.”
Link related pages together. If you have several pages or blog posts on the same topic, does it help to link those pages together? Gary said, “It’s good practice, but I don’t think you’ll see more gain than that from PageRank.” Google generally would tell publishers to make one great page as opposed to multiple semi-okay pages.
New on Search Engine Land