What SEOs need to know about Baidu in 2017

The first half 2017 was stressful for Baidu, which witnessed a recession in active advertisers and stagnant revenue. Nonetheless, we see the search giant putting a huge amount of resources into AI and into building China’s web ecosystem.

If you are in the business of inbound marketing to the Chinese market, this article is for you. I have wrapped up the most significant updates and tips officially given by Baidu Webmaster Tools (BWT) in the list below. Ready? Let’s get started.

Baidu MIP ramping up

Mobile Instant Pages (MIP) have reached several milestones in the first six months of 2017:

Moreover, MIP now has 215 components built for public use. The response time of the MIP cache has been optimized with speed increases of 50 percent or faster. And MIP now has enabled mip-install-serviceworker for offline caching.

In June, I spoke with Junjie Wang, the owner of Baidu MIP, at the Baidu VIP Conference in Shanghai. He explained that MIP, despite being a derived version of Google’s AMP, is optimized for the internet users in China who use different browsers and different browsing behaviors from those in the West. Baidu and Google have collaborated for a faster web; in fact, Baidu helped Google set up its AMP CDN in China.

Baidu has indexed a considerable number of AMP pages, although these don’t display the lightning icon in Baidu’s search results the way MIP pages do (see screen shot below). For sites only serving the audience from Mainland China, I would recommend you deploy MIP instead of AMP.

The Flash icon for MIP results on Baidu SERP

HTTPS

The other improvement Baidu is driving in China is the secure web. Baidu Webmaster Tools launched a new feature of HTTPS Site Authentication in May that allows HTTPS sites to have a better presence on Baidu SERPs.

Previously, when HTTPS pages weren’t well supported, Baidu didn’t know whether to index a non-secure page or a secure page. Sites had to build two versions with different protocols to have a better result in indexation. Now, once you have been through this authentication, only secure pages of your website will be indexed and presented on the SERPs.

Authenticate an HTTPs site in Baidu Webmaster Tools

PWA and Lavas

PWA (Progressive Web Apps) for Baidu have finally arrived! Just like Google’s PWA, the Baidu version of PWA can have features like Desktop Icon, Full-screen Browsing, Offline Caches and Push Messages.

A “Hello World” of Lavas PWA

In order to help developers build their PWA instance effectively, Baidu has launched a framework based on Vue as a solution and named it Lavas. With Lavas, you will have a set of templates that accelerate your development and deployment.

Algorithm: Hurricane

Content scraping is undoubtedly the greatest threat to content marketers in China’s internet. While Baidu is still testing its Original Content Protection feature with a few selected websites, they released an algorithm update, code-named Hurricane, which is taking on those websites with a majority of scraped content.

You will probably also find the copyright tag in Baidu Image Search results. This tag is meant to encourage content marketers to generate more original images and graphics.

The Copyright tag on Baidu Image Search

Crawler

In order to better understand what the page will look like to users, Baidu started testing its new spider with page-rendering capabilities in March. Now, the search engine has two new spiders in function.

For desktop version:

Mozilla/5.0 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)

For mobile version:

Mozilla/5.0 (iPhone; CPU iPhone OS 9_1 like Mac OS X) AppleWebKit/601.1.46 (KHTML, like Gecko) Version/9.0 Mobile/13B143 Safari/601.1 (compatible; Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)

It is easy to check if the IP is from a real Baidu bot. You can do host in Linux or ns lookup in Windows. See below:

nslookup for verifying the Baidu Spider

Baidu Mobile Search UX Whitepaper for Advertising 2.0

In mid-June, Baidu released its new UX Whitepaper for Mobile Search (v1.0 was released in March earlier this year). In it, Baidu published detailed mobile advertising guidelines. According to the whitepaper, the following types of ads will lead to a Baidu penalty:

  1. Pornography, seductive, gambling and other ads prohibited by laws
  2. Ads with scam and fraud messages
  3. Content with app-wall or auto-redirect to app stores
  4. Ads with massive size or large proportion of a page
  5. Ads covering content with layers
  6. Ads near the buttons on a page
  7. Auto-play video ads
  8. Ads between article heading and body text
  9. Ads between the body text and pagination

An example of an ad that would trigger the penalty from Baidu

SEO tips, straight from Baidu

In addition to the updates above, Baidu has also recently provided some SEO-specific guidance through its Webmaster Tools platform. I’ve summarized some of the most important advice below.

Page size/URL length

Baidu says your page size (the HTML) should not be larger than 128 KB. Pages using binary image data to convert to HTML can easily make the page size above 128 KB, and this is causing issues for the Baidu spider attempting to parse the page. In fact, if you have a page that is too big, it is best practice (for Baidu SEO) to implement pagination. Another tip is to avoid adding unnecessary code into your output in case it overflows.

In addition to page size, URL length is playing a critical role in pages being indexed. At Merkle, we’ve observed that clean and short URLs are getting indexed more quickly and are ranking higher. The recommended URL length is 76 characters, excluding the protocol. Hence, when adopting a URL convention, you need to avoid using Chinese characters in your URL, as the transcode will make those URLs much longer than it looks in Chinese characters.

404 pages/deleting pages

In May, Baidu posted an article on how to manage 404 pages (Chinese language). Handling 404 pages is different (and more complicated) in Baidu than in Google or Bing. Here is the suggested course of action:

  1. If you have website pages that no longer exist or that you need to delete, the first thing you need to do is to confirm that those pages are indexed by Baidu. You can search for the URL on Baidu or check your web analytics tools.
  2. The next step is to set the status code to 404 for those URLs. Of course, those URLs should not be disallowed in your robots.txt.
  3. Now, compile these pages into an XML or TXT file and make sure every single URL in this file is set to 404.
  4. Submit it to Baidu Webmaster Tools. The de-indexation will take effect in two to three days. Once the pages are no longer in the index, delete the XML or TXT you submitted.

Submit 404 files in Baidu Webmaster Tools.

Alternatively, if you want to delete a folder or a set of URLs beginning with a string, you can submit the pattern to Baidu Webmaster Tools. This pattern must end with a slash (/) or a question mark (?) — e.g., http://www.example.com/404page? or http://www.example.com/404folder/.

Avoid cheap domains

If you are running your business on a top-level domain (TLD) such as .top or .win, you need to be aware that your site may look spammy to Baudi.

Other spammy TLDs include, but are not limited to, .bid, .pw, .party and .science. Those domains are cheap. Therefore, they look fishy to Baidu.

TLD Annual Fee
(1st-Time Buy)
TLD Annual Fee
(1st-Time Buy)
.top ¥8 .tech ¥12
.win ¥8 .site ¥12
.bid ¥8 .market ¥16
.pw ¥8 .pub ¥16
.online ¥9 .video ¥16
.website ¥9 .wang ¥16
.club ¥9 .party ¥18
.space ¥9 .trade ¥18

Domains under $3 per year

According to Baidu (Chinese language), these cheap TLDs are low priority for indexation. If you insist on using such a domain, you must verify it with Baidu Webmaster Tools so that it can be regarded as a legitimate site.

Baidu cache

For the first time, Baidu explained how cached pages (known as “Baidu snapshots”) work (Chinese language). Cached pages are generated when Baidu crawls the page and adds it to the index (or updates the indexed version). How fresh your cached page is will depend on your site’s crawl frequency, which can vary from several minutes up to a month (depending on the site).

If you’ve blocked Baidu’s spider from your .js and .css resources, or if you use relative URLs in your HTML, the snapshot will look odd and unformatted. If you want to have the snapshot deleted, you can report an inappropriately cached page.

Report inappropriate cache for deletion.

Launching a new site

The last tip I’m sharing is how to give Baidu a stunning first impression when launching a new website.

You may only have a handful of pages at launch, or perhaps you have lots of pages that are low in quality (short/empty or with duplicate content). Unfortunately, this is a disaster to Baidu. Having a robust, high-quality website at launch shows Baidu that you know how to organize your content and provide reliable information. If you fail to make a good “first impression,” Baidu then allocates fewer resources crawling your site in the future — and consequently, it is difficult to win back their trust.

To solve this problem, Baidu suggests (Chinese language) disallowing the website during the UAT (User Acceptance Test) or Invite-only period.