Library Of Congress Struggling To Make A Searchable Twitter Archive

The Library of Congress is still working on plans to create a searchable archive of nearly every public tweet ever sent, but the challenges inherent in that task are making it a slow process.

Understandably so, considering the substantial growth in tweets in recent years; the LoC is essentially trying to tame a very rapidly moving dataset.

If it ever happens, a searchable archive of tweets could prove valuable to researchers, analysts, marketers and others. You can imagine brands wanting to search for Twitter trends surrounding major product/service announcements, or researchers looking for Twitter activity surrounding major world events.

On Friday, Gayle Osterberg, the Library’s Director of Communications, announced that the LoC is now getting about 500 million tweets per day, up from about 140 million when the project began in February 2011. She spelled out some of the challenges that the project poses.

Currently, executing a single search of just the fixed 2006-2010 archive on the Library’s systems could take 24 hours. This is an inadequate situation in which to begin offering access to researchers, as it so severely limits the number of possible searches.

The Library has assessed existing software and hardware solutions that divide and simultaneously search large data sets to reduce search time, so-called “distributed and parallel computing”. To achieve a significant reduction of search time, however, would require an extensive infrastructure of hundreds if not thousands of servers. This is cost-prohibitive and impractical for a public institution.

In a Washington Post article, Deputy Librarian of Congress Robert Dizard Jr. says the collection will eventually be made available only within the Library itself so that its archive doesn’t compete with commercial services that offer Twitter archives — that’s part of the agreement with Twitter.

But, as Gary Price said on INFOdocket, it doesn’t sound like any of that will happen anytime soon.

Twitter itself recently began letting users download their own tweet history, but the company doesn’t appear to have any plans to offer a historical search engine of its own.

Related Topics: Channel: Social | Search Engines: Social Search Engines | Top News | Twitter | Twitter: Search


About The Author: is Editor-In-Chief of Search Engine Land. His news career includes time spent in TV, radio, and print journalism. His web career continues to include a small number of SEO and social media consulting clients, as well as regular speaking engagements at marketing events around the U.S. He recently launched a site dedicated to Google Glass called Glass Almanac and also blogs at Small Business Search Marketing. Matt can be found on Twitter at @MattMcGee and/or on Google Plus. You can read Matt's disclosures on his personal blog.

Connect with the author via: Email | Twitter | Google+ | LinkedIn


Get all the top search stories emailed daily!  


Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • RankWatch

    That’s really a sea full of information that I and many researchers, marketers or analysts would like to get our hands dirty on. I do think that If Twitter or LoC comes up with a search engine with the archive of tweets for the public to mine out the interesting data, it might just give facebook and Other social networks run for their money. Many would surely like to bet their money on twitter if it ever happens.

  • Sparklo Dexter

    Ahh, this will be great! Organizations can just go to the Library of Congress to look for any drunken tweets that may have been deleted or made private by a potential employee/customer/enemy. Unfortunately, it looks like it will only be available to the type of organization that maintains a presence in/near Washington, DC.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest


Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States


Australia & China

Learn more about: SMX | MarTech

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!



Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide