Topsy: Now Searching Tweets Back To May 2008

Looking for old tweets? Look to Topsy. The service has just expanded to have what it claims to be the largest searchable collection of past tweets, over 5 billion of them, stretching back to at least May 2008. That makes it more comprehensive than Google’s Twitter search or even Twitter’s own Twitter Search.

Topsy will be sharing the news itself later today, on its blog. Beyond being comprehensive, another nice thing about Topsy is the ability to restrict a search using special “operators” or commands — such as “from” — to find tweets from a particular user or the ability to see tweets within a particular date range. Topsy has an advanced search page that makes it easy, as well as a list of commands.

Google lacks this type of filtering; Twitter has it, but only for going through tweets back for about a week or less. Of course, the Topsy tweets don’t always work as advertised. More on this, and how Topsy measures up against Google and Twitter, below.

Show Me The First Tweet By…

What was the first tweet from Ashton Kutcher? Heck, what was my first tweet? That’s a good test of comprehensiveness, if you can find the first tweet from well established Twitter accounts.

Using Twitter’s advanced search page, I can search for all tweets by Kutcher — from:aplusk – but the results only take me back 5 days.

How about Google? When Google’s Twitter archive search launched, it touted having tweets stretching back through February 11, 2010. That’s further back than Twitter search goes, but it won’t get me to Kutcher’s first tweet, not by a long shot. (A regular Google search for ashton kutcher first tweet, however, takes me right to his first one on Jan. 15, 2009).

Worse, there’s no “from” command at Google that lets me find tweets just from Kutcher. Instead, at best, you have to search for @aplusk, which brings back tweets from him plus anyone mentioning him. In addition, there may be non-Twitter updates mixed in with Twitter’s archive search, since other sources such as Facebook or MySpace also feed into it.

At Bing Social Search, the “from” command does work, so that I can see all the tweets by Kutcher it has indexed — and just tweets, nothing else mixed in. However, those only go back for six days

At Topsy, I can get the nearly 4,000 “All Time” tweets posted by Kutcher listed:

That sounds great, but getting to the last tweet is difficult. If you sort those tweets by “timeline,” so that the oldest tweet comes last, you’ll find that you can’t actually “page” your way back to it. Only pages 1 through 10 of search results are shown, currently getting you back to May 2010.

A trick is to search by specific date range. For example, here’s a search for all of January 2009, narrowed to those from Kutcher. The problem is that that his first tweet, which happened in this period, doesn’t actually appear. Switching the two pages of results from “relevancy” to “timeline” view makes things worse, listing only links that may or may not have been from Kutcher (it’s hard to tell).

The only way I could find his first tweet, in the end, was to search for the text “dropping my first tweet,” which listed his first tweet in the top results at Topsy. However, it was listed without a time stamp, which doubles as a way to click directly to the actual tweet, making me suspect that Topsy has some database issues.

Behind The Scenes

Despite this, Topsy clearly has a lot of tweets that go back in time. I suspect that when the bugs get worked out, doing a search to find someone’s first tweet — or tweets made within a particular data range — will be really useful.

Topsy knows things need to improve and is working on it. In the meantime, it emphasizes the fact that the date range feature can be used to view “highlights” for a particular period, telling me:

Reverse chronology is not well supported in the current user interface, which focuses on relevance, but we plan to introduce option for this in an upcoming release.

When you choose timeline sort on Topsy, the results are sorted by newest first but filtered by quality — it’s the top 100 results in a given time period, by newest first and a good way to track new, high quality results on any query. Think of it as the highlights for a given time period.

As for how far back the archives go and how the data was gathered, Topsy told me:

We started collecting tweets in May 2008 by polling search.twitter.com for all tweets with links. Our first index was built this way.

Topsy became the first search engine to start indexing native retweets via Twitter’s retweet streaming API in December 2009. The index contains every native retweet since. We’ve recently signed a contract with Twitter to index the entire firehose [firehose is jargon for the ability to tap Twitter's full stream of tweets].

The firehose does not contain all historical tweets (not for Topsy or Google). We do plan to work with Twitter to complete our index some day. Since the number of tweets per day has grown dramatically, the historical tweets will actually represent a pretty small part of the index.

By the way, while Topsy says you can go back through at least May 2008, I found some tweets that were older than that. I also could find data stretching way back through Dec. 2006 (by doing a date-restricted search for the word the). However, the further back you go, the more likely you’re getting only tweets associated with a link — and tweets that might not let you click from the date stamp to the actual tweet.

How They Stack Up

How do the major Twitter archive search services stack up? It’s really only Topsy versus Google, in this department. Twitter itself isn’t currently focused on trying to create a huge, searchable archive of tweets.

Make no mistake. Twitter has all the tweets people have done over time. They haven’t been lost. But when I spoke in June to Mike Abbott, Twitter’s vice president of engineering who oversees search, he explained to me that Twitter is focusing on building search products that others aren’t doing. With Google then, and Topsy now, focusing on comprehensive searching, Twitter is looking in other directions.

“Google doing it [archive search] takes some of the pressure off. Where do we want to innovate in this world and drive unique set of experiences?,” Abbott told me. He said such items would be finding ways to better connect Twitter users together with others of similar interest, or to do a search on Twitter that just shows tweets from your friends and followers.

Indeed, since I spoke with Abbott, Twitter’s released new ways to find people to follow when searching or when browsing your Twitter home page. The “Suggestions For You” feature, I’ve found to be incredibly useful. Our past articles below have more about these features:

So when I do the stack-up chart below, keep in mind that while I’m listing Twitter, it’s only to provide a benchmark to compare how Google and Topsy go beyond standard Twitter Search on the comprehensiveness of searching front.

Feature Twitter Google Topsy
Farthest Back You Can Search 4 to 7 days Feb. 2010 May 2008 (at least)
Search By Username Yes No Yes
Date Range Search Yes Only by clicking in timelines Yes (though buggy)
Sort Options By Date By Relevancy (Any time) & By Date (Latest) By Relevancy (Relevance) & By Date (Timeline / All Time)
Show Only Photos? No Yes Yes

Note the last row — the ability to search for tweets containing photos. Topsy makes it especially easy to find images that have been tweeted and says it has over 300 million images indexed. It even has a special page just for photo searching, Topsy Photos. For other services that let you find photos shared via Twitter, see our Google Adds Images To Real-Time Results post. Topsy also says it has indexed 2.5 billion links that have been shared on Twitter.

In the future, I’ll expand the table above to include some other services. In the meantime, here are some past articles that cover Twitter-related searching in various aspects:

Topsy

Related Topics: Channel: Social | Search Engines: Real Time Search | Top News | Topsy | Twitter

Sponsored


About The Author: is a Founding Editor of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also serves as Chief Content Officer for Third Door Media, which publishes Search Engine Land and produces the SMX: Search Marketing Expo conference series. He has a personal blog called Daggle (and keeps his disclosures page there). He can be found on Facebook, Google + and microblogs on Twitter as @dannysullivan.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.searchoftheday.com yehaskel

    It’s great that you can search that far back on Topsy, but after a few quick searches it doesn’t seem like Topsy is all that comprehensive as a Twitter search client.

    Example: “hiccups”

    Twitter search shows 22 in the last hour: http://search.twitter.com/search?q=hiccups&result_type=recent

    Icerocket Twitter search showing the tweets and an average 28.5 an hour.

    Topsy say 0 in the last hour and 16 in the last day: http://topsy.com/s?type=tweet&q=hiccups

    (these numbers will likely change throughout the day, and I can’t attach screen shots)

    If I search for something more common I get similar results. A search for “lindsay lohan” in quotes yields several per minute on Twitter search, 3.2 posts per minute on Icerocket, but only 10 in the last hour on Topsy.

    Just an observation… I wasn’t that impressed with the coverage.

  • http://topsy.com rishabghosh

    Hi Danny!

    Great analysis, as always. We just published a post on the Topsy blog with some technical details of what’s going on under the hood in our v2 search platform:

    http://labs.topsy.com/2010/08/24/topsy-deploys-v2-platform-to-index-100-billion-status-updates/

    Cheers,
    -Rishab

  • http://www.WebshareDesign.com coreykoberg

    For the most part I still find it easier just to use the standard Google operators. For example, your goal of finding Ashton’s first tweet via Google is fairly easy:
    http://www.google.com/search?q=site:twitter.com/aplusk&hl=en&client=opera&rls=en&sa=X&ei=uVF0TOSTI4P2tgOYspSbCA&ved=0CAwQpwU&source=lnt&tbs=cdr:1,cd_min:1/1/2009,cd_max:2/1/2009

    I could see the value of this if you wanted to search specifically from one user to another, or other very twitter specific operators, but 99% of the time the site:twitter.com/{username} search works quite well.

  • http://almightylink.ksablan.com/ Kevin Sablan

    Thanks again for another useful post, Danny. For what it’s worth, Snap Bird – http://snapbird.org – also lets you search for tweets past the week or so that Twitter Search provides. Unfortunately, it does not have robust advanced search options.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide