A Lesson From the Indexing of Google Translate: Blocking Search Results From Search Results

Last year, Google published an SEO Report Card of 100 Google properties. In it, they rated themselves on how well the sites were optimized for search. Google’s Matt Cutts presented the results at SMX West 2010 in Ignite format. He noted that not every Googler is an expert in search and search engine optimization. Googlers who don’t work in search don’t get preferential treatment from those who do and just like any site on the internet, sometimes things aren’t implemented correctly. Just because a site is owned by Google doesn’t mean it’s the best example of what to do in terms of SEO.

This morning Rishi Lakhani tweeted about Google Translate pages appearing in Google search results. As you can see in the example below, pages with individual translation requests have been indexed.

Google Translate Search Results

All of the URLs that include a parameter seem to be individual translations. For instance, http://translate.google.com/?q=ART# displays as follows:

Google Translate Example

The problems with these types of pages being indexed in search engines is twofold:

A site owner might also want to block these types of pages from being crawled and indexed to increase crawl efficiency and ensure the most valuable pages on the site are being crawled and indexed instead.

I asked Google about this and they confirmed that indeed it was simply a matter of the Google Translate team not being aware of the issue and said they would resolve it.

Blocking Autogenerated Search Pages From Being Indexed

In the case of Google Translate, the ideal scenario is that the main page and any secondary pages (such as this tools page) be indexed, but that any pages from translation requests not be indexed.

Using robots.txt

The best way to do this would be to add a disallow line in the robots.txt file for the site that blocks indexing based on a pattern match of the URL query parameter. For instance:

Disallow: /*q=

This pattern would prevent search engines from indexing any URLs containing q=. (The * before the q= means that the q= can appear anywhere in the URL.)

In the case of translate.google.com (and all related TLDs), the robots.txt file that exists for the subdomains seems to be copied from www.google.com. Remember that search engines obey the robots.txt file for each subomain separately. Using the same robots.txt file for a subdomain that’s used for the www variation of the domain could have unintended consequences because the subomain likely has an entirely different folder and URL structure. (You can always check the behavior of your robots.txt file using Google Webmaster Tools.)

Adding the disallow pattern shown above to the www.google.com/robots.txt file would not work as search engines wouldn’t check that file when crawling the translate subdomain and in would instead cause search engines not to index URLs that match the pattern on www.google.com.

translate.google.com (and all google.com subdomains should have their own robots.txt file that’s customized for that subdomain.

Using the meta robots tag

If Google isn’t able to create a separate robots.txt file for the translate subdomain, they should first remove the file that’s there (and from other subdomains as well, as it could be causing unexpected indexing results for those subdomains). Then, they should use the meta robots tag on the individual pages they want blocked. Since the pages in question are dynamically generated, the way to do this would be to add logic to the code that generates these pages that writes the robots meta tag to the page as its created. This tag belongs in the <head> section of the page and looks as follows:

<meta="robots" content="noindex">

Related Topics: Channel: SEO | Features: Analysis | Google: SEO | How To: SEO | SEO: Blocking Spiders


About The Author: is a Contributing Editor at Search Engine Land. She founded ninebyblue.com and Blueprint Search Analytics. Her book, Marketing in the Age of Google, (updated edition, May 2012) provides a blueprint for incorporating search strategy into organizations of all levels. Follow her on Twitter at @vanessafox.

Connect with the author via: Email | Google+


SMX - Search Marketing Expo

SearchCap:

Get all the top search stories emailed daily!  

Like This Story? Please Share!

Other ways to share:

Like Our Site? Follow Us!

Subscribe to Our Feed! Join our LinkedIn Group Check out our Tumblr! See us on Pinterest Get Search Engine Land on your mobile device!
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.

Comments are closed.

Get Our News, Everywhere!

 
  • Advertise With Us
 

Click to watch SMX conference video

Join us at an upcoming SMX event:

North America

EMEA

APAC

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.

SMX Site » | SMX Difference » | SMX News »




 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide