• Search Engine Land
  • Sections
    • SEO
    • SEM
    • Local
    • Retail
    • Google
    • Bing
    • Social
    • Resources
    • More
    • Home
  • Search Engine Land
  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Resources
  • Live
  • More
  • Events
  • SUBSCRIBE

Search Engine Land

Search Engine Land
  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Resources
  • More
  • Newsletters
  • Home
SEO

A Lesson From the Indexing of Google Translate: Blocking Search Results From Search Results

Last year, Google published an SEO Report Card of 100 Google properties. In it, they rated themselves on how well the sites were optimized for search. Google’s Matt Cutts presented the results at SMX West 2010 in Ignite format. He noted that not every Googler is an expert in search and search engine optimization. Googlers […]

Vanessa Fox on January 26, 2011 at 5:37 pm
  • More

Last year, Google published an SEO Report Card of 100 Google properties. In it, they rated themselves on how well the sites were optimized for search. Google’s Matt Cutts presented the results at SMX West 2010 in Ignite format. He noted that not every Googler is an expert in search and search engine optimization. Googlers who don’t work in search don’t get preferential treatment from those who do and just like any site on the internet, sometimes things aren’t implemented correctly. Just because a site is owned by Google doesn’t mean it’s the best example of what to do in terms of SEO.

This morning Rishi Lakhani tweeted about Google Translate pages appearing in Google search results. As you can see in the example below, pages with individual translation requests have been indexed.

Google Translate Search Results

All of the URLs that include a parameter seem to be individual translations. For instance, http://translate.google.com/?q=ART# displays as follows:

Google Translate Example

The problems with these types of pages being indexed in search engines is twofold:

  • The Google webmaster guidelines say that Google doesn’t want to show search results in its search results and recommends that content owners block search results on their site from being indexed using robots.txt or a meta robots tag.
  • That same guideline recommends blocking autogenerated pages from being indexed and a Google Webmaster Central blog a few months ago provided recommendations for handling machine-translated text so that it didn’t appear in search results.

A site owner might also want to block these types of pages from being crawled and indexed to increase crawl efficiency and ensure the most valuable pages on the site are being crawled and indexed instead.

I asked Google about this and they confirmed that indeed it was simply a matter of the Google Translate team not being aware of the issue and said they would resolve it.

Blocking Autogenerated Search Pages From Being Indexed

In the case of Google Translate, the ideal scenario is that the main page and any secondary pages (such as this tools page) be indexed, but that any pages from translation requests not be indexed.

Using robots.txt

The best way to do this would be to add a disallow line in the robots.txt file for the site that blocks indexing based on a pattern match of the URL query parameter. For instance:

Disallow: /*q=

This pattern would prevent search engines from indexing any URLs containing q=. (The * before the q= means that the q= can appear anywhere in the URL.)

In the case of translate.google.com (and all related TLDs), the robots.txt file that exists for the subdomains seems to be copied from www.google.com. Remember that search engines obey the robots.txt file for each subomain separately. Using the same robots.txt file for a subdomain that’s used for the www variation of the domain could have unintended consequences because the subomain likely has an entirely different folder and URL structure. (You can always check the behavior of your robots.txt file using Google Webmaster Tools.)

Adding the disallow pattern shown above to the www.google.com/robots.txt file would not work as search engines wouldn’t check that file when crawling the translate subdomain and in would instead cause search engines not to index URLs that match the pattern on www.google.com.

translate.google.com (and all google.com subdomains should have their own robots.txt file that’s customized for that subdomain.

Using the meta robots tag

If Google isn’t able to create a separate robots.txt file for the translate subdomain, they should first remove the file that’s there (and from other subdomains as well, as it could be causing unexpected indexing results for those subdomains). Then, they should use the meta robots tag on the individual pages they want blocked. Since the pages in question are dynamically generated, the way to do this would be to add logic to the code that generates these pages that writes the robots meta tag to the page as its created. This tag belongs in the <head> section of the page and looks as follows:

<meta="robots" content="noindex">

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.



About The Author

Vanessa Fox
Vanessa Fox is a Contributing Editor at Search Engine Land. She built Google Webmaster Central and went on to found software and consulting company Nine By Blue and create Blueprint Search Analytics< which she later sold. Her book, Marketing in the Age of Google, (updated edition, May 2012) provides a foundation for incorporating search strategy into organizations of all levels. Follow her on Twitter at @vanessafox.

Related Topics

Channel: SEOFeatures: AnalysisGoogle: SEOHow To: SEO

We're listening.

Have something to say about this article? Share it with us on Facebook, Twitter or our LinkedIn Group.

Get the daily newsletter search marketers rely on.

Processing...Please wait.

See terms.

ATTEND OUR EVENTS

Lorem ipsum doler this is promo text about SMX events.

Available On-Demand: SMX Create

May 18-19, 2021: SMX London

June 8-9, 2021: SMX Paris

June 15-16, 2021: SMX Advanced

June 21-22, 2021: SMX Advanced Europe

August 17, 2021: SMX Convert

November 9-10, 2021: SMX Next

December 14, 2021: SMX Code

Available On-Demand: SMX

Available On-Demand: SMX Report

×


Learn More About Our SMX Events

Discover actionable tactics that can help you overcome crucial marketing challenges. Our next conference will be held:

Next Event: Sept. 14-15, 2021

Available On-Demand: March 2021

Available On-Demand: October 2020

×

Attend MarTech - Click Here


Learn More About Our MarTech Events

White Papers

  • Gartner Magic Quadrant for Digital Experience Platforms
  • Selecting a Customer Data Platform For Your Organization: The 2020 Gartner Market Guide
  • The Complete Guide to Web Core Vitals
  • The New Era of Automation in SEO
  • Nielsen Annual Marketing Report: Era of Adaptation
See More Whitepapers

Webinars

  • Drive Customer Engagement with the Power of Personalization
  • 7 Use Cases That Prove Why You Should Implement DAM
  • Accelerate Your SEO & Content Marketing Program with 4 Key Milestones
See More Webinars

Research Reports

  • Local Marketing Solutions for Multi-Location Businesses
  • Enterprise Digital Asset Management Platforms
  • Identity Resolution Platforms
  • Customer Data Platforms
  • B2B Marketing Automation Platforms
  • Call Analytics Platforms
See More Research

Attend SMX For Only $199

h
Receive daily search news and analysis.

Channels

  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social

Our Events

  • SMX
  • MarTech

Resources

  • White Papers
  • Research
  • Webinars

About

  • About Us
  • Contact
  • Privacy
  • Marketing Opportunities
  • Staff

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • Newsletters
  • RSS
  • Youtube

© 2021 Third Door Media, Inc. All rights reserved.

Your privacy means the world to us. We share your personal information only when you give us explicit permission to do so, and confirm we have your permission each time. Learn more by viewing our privacy policy.Ok