Search Engine Land
  • SEO
    • > All SEO
    • > What Is SEO?
    • > SEO Periodic Table
    • > Google: SEO
    • > Bing SEO
    • > Google Algorithm Updates
  • PPC
    • > All PPC
    • > What is PPC?
    • > Google Ads
    • > Microsoft Ads
    • > The Periodic Tables of PPC
  • Focuses
    • > Local
    • > Commerce
    • > Shopify SEO Guide
    • > Content
    • > Email Marketing Periodic Table
    • > Social Media Marketing
    • > Analytics
    • > Search Engine Land Awards
    • > All Focuses
  • SMX
  • Webinars
  • Intelligence Reports
  • White Papers
  • About
    • > About Search Engine Land
    • > Newsletter
    • > Third Door Media
    • > Advertise

Processing...Please wait.

Search Engine Land » Channel » SEO » Library Of Congress Struggling To Make A Searchable Twitter Archive

Library Of Congress Struggling To Make A Searchable Twitter Archive

The Library of Congress is still working on plans to create a searchable archive of nearly every public tweet ever sent, but the challenges inherent in that task are making it a slow process. Understandably so, considering the substantial growth in tweets in recent years; the LoC is essentially trying to tame a very rapidly […]

Matt McGee on January 7, 2013 at 8:02 am

The Library of Congress is still working on plans to create a searchable archive of nearly every public tweet ever sent, but the challenges inherent in that task are making it a slow process.

Understandably so, considering the substantial growth in tweets in recent years; the LoC is essentially trying to tame a very rapidly moving dataset.

If it ever happens, a searchable archive of tweets could prove valuable to researchers, analysts, marketers and others. You can imagine brands wanting to search for Twitter trends surrounding major product/service announcements, or researchers looking for Twitter activity surrounding major world events.

On Friday, Gayle Osterberg, the Library’s Director of Communications, announced that the LoC is now getting about 500 million tweets per day, up from about 140 million when the project began in February 2011. She spelled out some of the challenges that the project poses.

Currently, executing a single search of just the fixed 2006-2010 archive on the Library’s systems could take 24 hours. This is an inadequate situation in which to begin offering access to researchers, as it so severely limits the number of possible searches.

The Library has assessed existing software and hardware solutions that divide and simultaneously search large data sets to reduce search time, so-called “distributed and parallel computing”. To achieve a significant reduction of search time, however, would require an extensive infrastructure of hundreds if not thousands of servers. This is cost-prohibitive and impractical for a public institution.

In a Washington Post article, Deputy Librarian of Congress Robert Dizard Jr. says the collection will eventually be made available only within the Library itself so that its archive doesn’t compete with commercial services that offer Twitter archives — that’s part of the agreement with Twitter.

But, as Gary Price said on INFOdocket, it doesn’t sound like any of that will happen anytime soon.

Twitter itself recently began letting users download their own tweet history, but the company doesn’t appear to have any plans to offer a historical search engine of its own.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


New on Search Engine Land

    Google search results spam for ‘Bill Slawski obituary’ shows the dark side of SEO

    New mobile Google ad experiment puts favicon in-line with display URL

    Google launches video health tools to help publisher monetization

    SEO pioneer and expert Bill Slawski passes away

    New Yelp feature: Request a Call

About The Author

Matt McGee
Matt McGee joined Third Door Media as a writer/reporter/editor in September 2008. He served as Editor-In-Chief from January 2013 until his departure in July 2017. He can be found on Twitter at @MattMcGee.

Related Topics

SEOSocialTwitter

Get the daily newsletter search marketers rely on.

Processing...Please wait.

See terms.

ATTEND OUR EVENTS

Learn actionable search marketing tactics that can help you drive more traffic, leads, and revenue.

March 8-9, 2022: Master Classes (virtual)

June 14-15, 2022: SMX Advanced (virtual)

November 15-16, 2022: SMX Next (virtual)

Learn More About Our SMX Events

Discover time-saving technologies and actionable tactics that can help you overcome crucial marketing challenges.

Start Discovering Now: Spring (virtual)

September 28-29, 2022: Fall (virtual)

Learn More About Our MarTech Events

Webinars

Take a Crawl, Walk, Run Approach to Multi-Channel ABM

Content Comes First: Transform Your Operations With DAM

Dominate Your Competition with Google Auction Insights and Search Intelligence

See More Webinars

Intelligence Reports

Enterprise SEO Platforms: A Marketer’s Guide

Enterprise Identity Resolution Platforms

Email Marketing Platforms: A Marketer’s Guide

Enterprise Sales Enablement Platforms: A Marketer’s Guide

Enterprise Digital Experience Platforms: A Marketer’s Guide

Enterprise Call Analytics Platforms: A Marketer’s Guide

See More Intelligence Reports

White Papers

Reputation Management For Healthcare Organizations

Unlock the App Marketing Potential of QR Codes

Realising the power of virtual events for demand generation

The Progressive Marketer’s Ultimate Events Strategy 2022 Worksheet

CMO Guide: How to Plan Smart and Pivot Fast

See More Whitepapers

Receive daily search news and analysis.

Processing...Please wait.

Topics

  • SEO
  • PPC

Our Events

  • Search Marketing Expo - SMX
  • MarTech

About

  • About Us
  • Contact
  • Privacy
  • Marketing Opportunities
  • Staff

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • Newsletters
  • RSS
  • Youtube

© 2022 Third Door Media, Inc. All rights reserved.