Twitter Opens Up To More Crawling, But Do Search Engines Want Its Search Results In Theirs?

Twitter recently updated its robots.txt file and, though the change opens up millions of pages to being crawled, there’s no guarantee that the main search engines want what Twitter is offering.

The Sociable seems to have been the first to notice Twitter’s robots.txt changes, which now specifically allow Google, Bing, Yahoo, Yandex and other bots to crawl through some of Twitter’s search results pages.

Twitter: Change Made To Help With Discovery

Twitter confirmed the change to us, saying:

This change will help people find popular and helpful Twitter pages, such as the #olympics hashtag page (https://twitter.com/search/?q=%23olympics)

Twitter is opening up its content search results — i.e., tweets and hashtags — while still preventing bots from crawling its search results for users, videos and images. Tweet and hashtag search results show via the /search path; searches for Twitter users show via the /search/users path; and searches for photos and videos show via /search/(keyword)/grid.

As you can see below, Twitter is allowing Google to crawl its tweet/hashtag search results, but still blocking the other two types; Twitter’s instructions to Bing and other search engines are the same.

twitter-robots

Twitter’s tweet/hashtag search results sometimes have and/or lead to interesting content — think, for example, of a search for #emmys during last weekend’s Emmy Awards or a search for #olympics during the London Summer Games. The latter search led to Twitter’s Olympics hashtag page, a formal partnership with NBC that involved a mix of human- and algorithmically-curated content about the Olympic games.

Do Search Engines Want Twitter Search Results?

Clearly, there’s value to Twitter in having an Olympics hashtag page and other search results pages showing up in Google’s (or Bing’s) index — especially during big events. From an SEO perspective, Twitter has enough domain authority that it should be able to rank well for certain queries.

But do the search engines want to index Twitter’s search results?

Google has been adamant over the years that it doesn’t want to show search results pages in its own search results. There’s a specific webmaster guideline that tells webmasters not to do what Twitter is doing:

Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.

At the moment, I’m seeing plenty of Twitter search result pages showing up on a site:twitter.com/search query.

twitter-search-google

In its article, The Social notes that Twitter was blocking those search result URLs from being crawled as recently as September 11th. Twitter and Google have been sparring for more than a year about crawling and indexing, going back to July 2011 when an agreement ended that previously gave Google a special feed of Twitter content — and killed Google’s Realtime Search product in the process.

I wasn’t able to find any guidelines in Bing’s webmaster documents, but we’ll try to find out their approach to indexing another site’s search results.

Related Topics: Channel: SEO | Google: SEO | Google: Web Search | Microsoft: Bing | SEO: Blocking Spiders | Top News | Twitter | Twitter: Search

Sponsored


About The Author: is Editor-In-Chief of Search Engine Land. His news career includes time spent in TV, radio, and print journalism. His web career continues to include a small number of SEO and social media consulting clients, as well as regular speaking engagements at marketing events around the U.S. He recently launched a site dedicated to Google Glass called Glass Almanac and also blogs at Small Business Search Marketing. Matt can be found on Twitter at @MattMcGee and/or on Google Plus. You can read Matt's disclosures on his personal blog.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://blog.clayburngriffin.com/ Clayburn Griffin

    They need to implement canonical tags or something. I don’t like having the Spanish version of my profile indexed above the English one.

  • http://twitter.com/Greekgeek Greekgeek

    Aha! Okay, I’m not crazy. Last Friday, I noticed a Tweet in Google SERPs last week and thought, “That’s funny, wasn’t Twitter supposed to be blocking Google these days?”

    It was a Tweet with #spottheshuttle in it, which was trending. So perhaps Google is indexing trending hashtags to help it with the fresh content side of things.

  • http://www.winsonyeung.com/ Winson Yeung

    awesome! they should have open up to the index earlier

  • Matt McGee

    That could be, but Google has always been able to index individual tweets. What this change is about is that Twitter is now letting it index search result pages, too.

  • RizzoMB

    Search Engines have no choice, Twitter is the best source for live news in the world period.

  • http://twitter.com/Greekgeek Greekgeek

    Apparently I failed my pre-breakfast reading comprehension test; thanks for the clarification.

  • http://www.daddysbroke.com/ Craig

    It’s about time! My SERP for my twitter jumped to #2 on google for my name.

  • Miguel Silva Rodrigues

    Regarding Google’s guidelines, the key factor is value. Hence the question is: do these pages add value? Most likely they do carry content that isn’t indexed; but it will also be outdated very quickly. It’s probably a win for search engines and Internet users, but not as big as the previous agreement between Twitter and Google could have been.

  • http://www.eyewebmaster.com Rosendo A. Cuyasen

    This is really a good innovation for Twitter. As we all know if we search a brand in the search engines only Google, facebook and linkedin are giving search relevant results.

  • http://www.facebook.com/ellis.lp.1 Ellis LP

    It one of the best things twitter can do. The Events Business industry needed a real time feed that was the opinions of the public/nation, because those are the ones that are sharing their experience and opening its robot.text file to search pages makes them easier to find. I think Google would be plain stubborn if they were to stop it.

  • http://twitter.com/sharithurow sharithurow

    I have to agree with Google on this one. Search results in search results? Extremely annoying. Our search usability tests keep confirming this year after year. Your context is not likely your target users’ context.

    My 2 cents.

  • http://www.jesshill.me/ Jess Hill

    Just one technical clarification, there is no “Allow” field in robots.txt, as stated here: http://www.robotstxt.org/robotstxt.html (scroll to the very bottom). Even though Twitter has included that in their robots.txt file, the real command is the Disallow on users and grid.

  • Eric Wu

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide