In the last 20 years, Google’s search engine has changed a lot. If we take a look at technology and web development as a whole, we can see the pace of change is pretty spectacular.
This website from 1998 was informative, but not very attractive or easy to use:
Modern websites not only look much better, but they are equipped with powerful features, such as push notifications, working partially offline and loading in a blink of the eye.
At the very beginning, when the World Wide Web was built with websites made up of only static hypertext markup language (HTML), Google had a simple task to complete:
Make a request to the server → get the static HTML response → index the page
I know this is a super-simple description of the process, but I want to show the differences between processing websites back when and processing websites today.
Google solved the issue by trying to render almost all the pages they visit. So now ,the process looks more or less like this:
Make a request to the server → GET the static HTML response → Send it to the indexer → Render the page →
Index and send the extracted links to Googlebot → Googlebot can crawl the next pages.
- Googlebot’s crawling is slowed down. It doesn’t see hyperlinks in the source code of a JS website so it needs to wait for the indexer to render the page and then sends the extracted URLs back.
The proper approach
A. What’s the scale of the problem?
- Partial JS dependencies. Visit the Angular.io website and switch JS off in the browser — the main navigation doesn’t work (but links are available in the document object model [DOM], which I’ll talk about later).
- Meaningful JS dependencies. Visit the AutoZone and switch JS off — the main navigation might not work, and the links might not be available in the DOM.
- Complete JS dependencies. Visit YouTube, switch JS off and notice all of the content disappears!
B. Where is the website built?
Static HTML websites are built on your server. After an initial request from Googlebot (and users, too), it receives a static page in response.
C. What limits does Google have?
Some time ago, Google revealed how it renders websites: Shared web rending services (WRS) are responsible for rendering the pages. Behind them stands a headless browser based on Chrome 41 which was introduced in 2015, so it’s a little out of date. The fact that Google uses a three-year-old browser has a real impact on rendering modern web applications because it doesn’t support all the current features used by modern apps.
Eric Bidelman, an engineer at Google, confirmed that they are aware of the limits Google has with JS. Based on unofficial statements, we can expect that Chrome 41 will be updated to a more recent version at the end of 2018.
To get significant insight into what is supported and not supported, visit Caniuse.com and compare Chrome 41 with the most recent version of Chrome. The list is long:
Timeouts are the next thing that makes JS and SEO a difficult match.
Google needs to reasonably manage its processing resources because of the massive amount of data it needs to process. The World Wide Web consists of over a billion websites, and it’s growing every day. The chart below shows that the median size of the desktop version of the pages increased by almost 100 percent in the last five years. The adequate metric for the mobile version of the website increased by 250 percent!
Preparation and helpful resources
Google knows SEOs and developers are having problems understanding search behavior, and they are trying to give us a helping hand. Here are some resources from Google you should follow and check to help with any JS issues you may have:
- Webmaster trends analyst John Mueller.
- Webmaster trends analyst Gary Illyes.
- Engineer Eric Bidelman.
- Video: “SEO best practices and requirements for modern sites” with John Mueller.
What does Google see?
Three years ago, Google announced that it is able to render and understand websites like modern browsers. But if we look at the articles and the comments on rendering JS websites, you will notice they contain many cautionary words like: “probably,” “generally” and “not always.”
This should highlight the fact that while Google is getting better and better in JS execution, it still has a lot of room for improvement.
Source code vs. DOM
The source code is what Googlebot sees after entering the page. It’s the raw HTML without JS integration into the code. An important thing to keep in mind is the fact that Googlebot does not render the pages.
The “Inspect Element” shows the document object model. Rendering is done by Web Rendering Service, which is a part of Google’s Indexer. Here are some important points to keep in mind:
- Raw HTML is taken into consideration while crawling.
- DOM is taken into consideration while indexing.
- First wave: Google extracts only the metadata and indexes the URL based on this information.
- Second wave: If Google has spare resources, it renders the page to see the content. It can reindex the page and join these two data sources.
However, recently John Mueller said if Google gets stuck during the rendering of pages, a raw HTML might be used for indexing.
Even if you see that a particular URL is indexed, it doesn’t mean the content was discovered by the indexer. I know that it might be confusing, so here’s a small cheat sheet:
- To see the HTML sent to Googlebot, go to Google Search Console and use the Fetch and Render tool. Here you have access to the raw HTTP response.
- To see the rendered version of the page, you can use the Fetch and Render tool as well.
- To see the DOM built by the web rendering service (WRS) for desktop devices, use the Rich Results Test. For mobile devices, use the Mobile-Friendly test.
Google officially confirmed we can rely on these two methods of checking how Google “sees” the website:
Compare the source code with DOM
Now, it’s time to analyze the code and the DOM.
In the first step, compare them in terms of indexability, and check if the source code contains:
- Meta robots instructions like indexing rules.
- Canonical tags.
- Hreflang tags.
Then see if they are compliant with the rendered version of the website.
To spot the differences, you can use a tool like Diff Checker, which will compare text differences between two files.
Using Diff Checker, grab the raw hypertext transfer protocol (HTTP) response from the Google Search Console and compare it with the DOM from the tools mentioned in Point 3 above (the Rich Results test and the Mobile-Friendly test).
Googlebot doesn’t scroll
While looking at the DOM, it’s also worth verifying the elements dependent on events like clicking, scrolling and filling forms.
Two waves of indexing and its consequences
Going back to those two waves I mentioned earlier, Google admits that metadata is taken into consideration only in the first wave of indexing. If the source code doesn’t contain robots instruction, hreflangs or canonical tags, it might not be discovered by Google.
How does Google see your website?
To check how Google sees the rendered version of your website, go to the Fetch as Google tool in Google Search Console and provide the URL you want to check and click Fetch and Render.
For complex or dynamic websites, it’s not enough to verify if all the elements of the website are in their place.
Google officially says that Chrome 41 is behind the Fetch and Render tool, so it’s best to download and install that exact version of the browser.
I’d like to mention some common and trivial mistakes to avoid:
Be careful while analyzing mega menus. Sometimes they are packed with fancy features which are not always good for SEO. Here is a tip from John Mueller on how to see if the navigation works for Google:
Also be careful with “load more” pagination and infinite scroll. These elements are also tricky. They load additional pieces of content in a smooth way, but it happens after the interaction with the website, which means we won’t find the content in the DOM.
At the Google I/O conference, Tom Greenway mentioned two acceptable solutions for this issue: You can preload these links and hide them via the CSS or you can provide standard hyperlinks to the subsequent pages so the button needs to link to a separate URL with the next content in the sequence.
The next important element is the method of embedding internal links. Googlebot follows only standard hyperlinks, which means you need to see links like these in the code: (without the spacing)
< a href = ”https://www.domain.com”> text </a>
If you see OnClick links instead, they look like this and will not be discovered:
< div OnClick=”location.href=”https://www.domain.com”> text < /div >
So, while browsing through the source code and the DOM, always check to be sure you are using the proper method on your internal links.
URLs — clean & unique
The fundamental rule to get content indexed is to provide clean and unique URLs for each piece of content.
Many times, JS-powered websites use a hashtag in the URL. Google has clearly stated that in most cases, this type of URL won’t be discovered by the crawler.
While analyzing the website, check to see that the structure is not built with URLs like these:
Everything after the # sign in the URL will be trimmed and ignored by Google, so the content won’t be indexed!
Unfortunately, diagnosing problems with timeouts is not easy. If we don’t serve the content fast enough, we can fail to get the content indexed.
How can we spot these problems? We can crawl the website with a tool like Screaming Frog with the delays set to 5 seconds. In rendering mode, you can see if everything is fine with the rendered version.
John Mueller suggests we can check if Google rendered the page on time in the Mobile-friendly test, and if the website works it should be OK for indexing.
While analyzing the website, look to see if the website implements artificial delays like loaders, which forces waiting for content delivery:
There is no reason for setting similar elements; it may have dramatic effects in terms of indexing the content which won’t be discoverable.
You gain nothing if the content is not indexed. It’s the easiest element to check and diagnose and is the most important!
Use the site:domain.com command
The most useful method of checking indexation is the well-known query:
Site:domain ‘a few lines of the content from your website’
If you search for a bit of content and find it in the search results, that’s great! But if you don’t find it, roll up your sleeves and get to work. You need to find out why it’s not indexed!
If you want to conduct a complex indexation analysis, you need to check the parts of the content from different types of pages available on the domain and from different sections.
Google says there may be issues with loading “lazy” images:
The second option which makes lazy content discoverable to Google is structured data:
Don’t use this article as the only checklist you’ll use for JS websites. While there is a lot of information here, it’s not enough.
This article is meant to be a starting point for deeper analysis. Every website is different, and when you consider the unique frameworks and individual developer creativity, it is impossible to close an audit with merely a checklist.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.