Twitter recently updated its robots.txt file and, though the change opens up millions of pages to being crawled, there’s no guarantee that the main search engines want what Twitter is offering.
The Sociable seems to have been the first to notice Twitter’s robots.txt changes, which now specifically allow Google, Bing, Yahoo, Yandex and other bots to crawl through some of Twitter’s search results pages.
Twitter: Change Made To Help With Discovery
Twitter confirmed the change to us, saying:
This change will help people find popular and helpful Twitter pages, such as the #olympics hashtag page (https://twitter.com/search/?q=%23olympics)
Twitter is opening up its content search results — i.e., tweets and hashtags — while still preventing bots from crawling its search results for users, videos and images. Tweet and hashtag search results show via the /search path; searches for Twitter users show via the /search/users path; and searches for photos and videos show via /search/(keyword)/grid.
As you can see below, Twitter is allowing Google to crawl its tweet/hashtag search results, but still blocking the other two types; Twitter’s instructions to Bing and other search engines are the same.
Twitter’s tweet/hashtag search results sometimes have and/or lead to interesting content — think, for example, of a search for #emmys during last weekend’s Emmy Awards or a search for #olympics during the London Summer Games. The latter search led to Twitter’s Olympics hashtag page, a formal partnership with NBC that involved a mix of human- and algorithmically-curated content about the Olympic games.
Do Search Engines Want Twitter Search Results?
Clearly, there’s value to Twitter in having an Olympics hashtag page and other search results pages showing up in Google’s (or Bing’s) index — especially during big events. From an SEO perspective, Twitter has enough domain authority that it should be able to rank well for certain queries.
But do the search engines want to index Twitter’s search results?
Google has been adamant over the years that it doesn’t want to show search results pages in its own search results. There’s a specific webmaster guideline that tells webmasters not to do what Twitter is doing:
Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.
At the moment, I’m seeing plenty of Twitter search result pages showing up on a site:twitter.com/search query.
In its article, The Social notes that Twitter was blocking those search result URLs from being crawled as recently as September 11th. Twitter and Google have been sparring for more than a year about crawling and indexing, going back to July 2011 when an agreement ended that previously gave Google a special feed of Twitter content — and killed Google’s Realtime Search product in the process.
I wasn’t able to find any guidelines in Bing’s webmaster documents, but we’ll try to find out their approach to indexing another site’s search results.