Twitter is working on a project that will give users access to an archive of all of their own tweets.
Twitter CEO Dick Costolo told the New York Times this week that the company is building a tool that would let Twitter users download their own tweets. He didn’t provide a timeframe for such a tool to be ready.
Twitter search is notoriously shallow. Although the company has recently upgraded its search capabilities, the site’s search box only shows results from the past seven days.
With users tweeting more than 400 million times per day, a full-history, all-Twitter search/download tool may not be feasible from an engineering standpoint. Costolo essentially said as much to the Times:
It’s two different search problems. It’s a different way of architecting search, going through all tweets of all time. You can’t just put three engineers on it.”
So, don’t hold your breath waiting for a way to search or download a full history of tweets from anyone except yourself.
The Sad State of Searching Old Tweets
A tool that gives us access to all of our own tweets would be a step in the right direction, but it highlights how poor the overall state of Twitter search is today — historical search, that is.
Google & Bing (Yandex, Too)
About six months later, Google announced a Twitter archive search tool that started with about two months of tweets, but Google said it planned to offer tweets all the way back to Twitter’s beginnings in March 2006. This eventually became part of Google Realtime Search, which went offline last summer when Twitter and Google failed to renew their agreement.
Bing continues to have access to Twitter’s “firehose” of tweets, but Bing Social Search appears to only surface tweets that are less than a week old.
The Russian search engine Yandex recently secured access to the Twitter firehose, too, and last night I was able to page through about 800 tweets that mentioned my @mattmcgee handle going back to 2009. So, it’s far from complete, but it does seem to be the best option from a traditional search engine.
Topsy is the last of the so-called real-time search engines that’s still standing, and it offers some pretty powerful options on its advanced search page — including the ability to search tweets from specific users and within certain dates. Topsy’s index goes back to the spring of 2008.
But like the other options, it’s far from comprehensive. On a search for my own tweets from July 4th and 5th — just three weeks ago — Topsy’s search results only showed four tweets out of the 20 that I posted over those two days.
Library of Congress
Little has been said since then about the tweet archiving project. But just last week, a Library spokesperson told the Nieman Journalism Lab that it’s still alive and “getting a lot closer.”
Alas, the Library of Congress has no plans to make that archive available online; it’ll be accessible only at the Library itself.