How Microsoft Removes “Junk” From Bing Search Results

Dr. Richard Qian from Bing’s core search team wrote a blog post on the Bing Search blog named Bing Search Quality Insights: Reducing Junk. This is part of Bing’s ongoing effort to provide search quality insights on how Bing works. Bing here explains how they handle removing bad links from the Bing search results, and […]

Chat with SearchBot

Bing JunkDr. Richard Qian from Bing’s core search team wrote a blog post on the Bing Search blog named Bing Search Quality Insights: Reducing Junk. This is part of Bing’s ongoing effort to provide search quality insights on how Bing works.

Bing here explains how they handle removing bad links from the Bing search results, and also have they handle junky or empty snippets.

Junk links include:

  • Dead Links
  • Soft 404
  • Parked Domains

Junk or Empty Snippets include:

  • Junky Snippets
  • Empty Snippets

Dead link examples are pages that return a 4xx or 5xx error code is returned from an HTTP request for a page. There are times where there is a dead link in Bing and Bing isn’t aware of it because they have not crawled the web page since it returned a proper result. But Bing’s crawler does crawl often and is able to detect dead links fairly quickly. When Bing does detect a dead link, depending on their algorithms they may “boost its re-crawl priority and frequency” to see if the dead link was a temporary error and should return to the search results or not.

A soft 404 is like a hard 404 but without returning a 404 header status. Bing said they use their “high precision classifiers in this area use page content such as key phrases in the page’s title, body and URL to determine if the page is a soft 404 and whether to remove it from the search results.”

Bing doesn’t want parked domains to show up in the search results so they use signatures to identify parked domains and remove them.

Bing also uses various techniques to improve their encoding classifier, document convertor, garbage detector, and HTML parser have reduced the occurrence of junky snippets.

For snippets that are empty, Bing uses dynamic crawlers and document processors, plus a number of classifiers to determine the appropriate snippet for the search result.

For more details, see the Bing blog.

Related Articles:


About the author

Barry Schwartz
Staff
Barry Schwartz is a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics. Barry can be followed on Twitter here.

Get the must-read newsletter for search marketers.