Yandex ‘leak’ reveals 1,922 search ranking factors
SEOs have already started analyzing Yandex's search ranking factors, which include PageRank and several other link-related factors
A former employee allegedly leaked a Yandex source code repository, part of which contained more than 1,900 factors used by the search engines for ranking websites in search results.
Why we care. This leak has revealed 1,922 ranking factors Yandex used in its search algorithm, at least as of July 2022. Perhaps Martin MacDonald put it best on Twitter today: “The Yandex hack is probably the most interesting thing to have happened in SEO in years.”
Yandex is not Google. If you plan to read the full list of Yandex ranking factors, remember that Yandex is not Google. If you see a ranking factor listed by Yandex, that doesn’t mean Google gives that signal that same amount of weight. In fact, Google may not use all of the 1,922 factors listed. In fact, many of the factors in this leak are deprecated or unused.
That said, a lot of these ranking factors may be quite similar to signals Google uses for search. So reviewing this document may provide some useful insights to better help you understand how search engines, such as Google, work from a technological standpoint.
The bigger picture. The code appeared as a Torrent on a popular hacking forum, as reported by Bleeping Computer:
…the leaker posted a magnet link that they claim are ‘Yandex git sources’ consisting of 44.7 GB of files stolen from the company in July 2022. These code repositories allegedly contain all of the company’s source code besides anti-spam rules.
Yandex calls it a leak. Because the code appeared on a popular hacking forum, it was first thought that Yandex was hacked. Yandex has denied this, and provided the following statement:
“Yandex was not hacked. Our security service found code fragments from an internal repository in the public domain, but the content differs from the current version of the repository used in Yandex services.
A repository is a tool for storing and working with code. Code is used in this way internally by most companies.
Repositories are needed to work with code and are not intended for the storage of personal user data. We are conducting an internal investigation into the reasons for the release of source code fragments to the public, but we do not see any threat to user data or platform performance.”
Dig deeper. You can find more coverage of the leak on Techmeme.
Yandex ranking factors list. MacDonald shared the full list of 1,922 factors here on Web Marketing School. I highly recommend downloading it, as I fully expect Yandex will try to scrub this information from the internet. (Editor’s note: In an earlier version of this article, we had linked to a translated version on Dropbox, but that link quickly went away.)
Early analysis of ranking factors. Alex Buraks created two Twitter threads – first thread, second thread – analyzing the various ranking factors. There’s another interesting Twitter thread here from Michael King.
Dan Taylor also shares some findings in Yandex Data Leak: What We’ve Learned About The Search Algorithms on Russian Search News.
Many of Yandex’s ranking factors are what you’d expect to see:
- PageRank and many link-related factors (e.g., age, relevancy, etc.).
- Text relevancy.
- Content age and freshness.
- End-user behavior signals.
- Host reliability.
- Some sites get preference (e.g., Wikipedia).
Some of the ranking factors SEOs are finding surprising: number of unique visitors, percent of organic traffic and average domain ranking across queries.
And as Taylor pointed out, 244 of the ranking factors were categorized as unused and 988 as deprecated, “meaning that 64% of the document is either not actively used or has been superseded – so it’s more like ~690 potential ranking factors, and a lot of them contain thin descriptions.”
Yandex Search Ranking Factor Explorer. Rob Ousbey has created Yandex Search Ranking Factor Explorer, a tool to search the various ranking factors.
Dig deeper. Michael King has taken a deep dive into the code in Yandex scrapes Google and other SEO learnings from the source code leak here on Search Engine Land. It turns out there are actually 17,854 ranking factors, not 1,922. Some additional discoveries: the initial weighting of ranking factors, the top 5 negatively and positively weighted initial ranking factors, link factors and prioritization and so much more.
New on Search Engine Land