Wayback Machine Now Has 240 Billion URLs

The Wayback Machine from the Internet Archive, one of the most useful and important Internet research tools, recently reached a major milestone.

In a blog post, archive founder Brewster Kahle announced that The Wayback Machine now provides access an index containing more than 240 billion URLs (about five petabytes of data), with archived pages dating back to 1996.

The amount of newly accessible archived material is huge. Prior to this update, The Wayback Machine provided access to about 150 billion URLs.

Researchers should note that a small amount of the index available in the prior release is temporarily unavailable via the new and larger index. So, the older index remains available using a different interface.

Along with announcing the new release Kahle said the database can potentially access archived pages that were online as recently as early December 2012.

This is also exciting news for researchers since the lag time between the time a page was crawled and indexed and then became accessible via The Wayback Machine could often be six months or longer.

Kahle also mentions that Wayback is now handling more than 1,000 queries per second by more than 500,000 people a day.

All of this news follows a New Year’s Eve blog post by Brewster Kahle announcing that the Internet Archive had just completed raising one million dollars that will allow the organization to purchase four more petabytes of storage. The fundraising continues because the archive estimates they’ll need more than ten petabytes of storage during 2013.

Archive It

Although web pages indexed by The Wayback Machine are NOT keyword searchable more than a thousand collections of archived web pages focusing on wide variety of topics ARE keyword searchable.

These archives are made available by Archive-It, a fee-based service, that’s part of The Internet Archive. These collections are built targeting specific urls to crawl, index, and archive.

Archive-It works with the education community (K-12 and higher ed), libraries, government agencies, non-profits, and others. Many of these groups make their collections accessible to all users.

Here are a Few Examples of Archive-It Collections:

Here’s a directory with information and links to more than 1800 Archive-It collections that are keyword searchable.

Related Topics: Channel: Consumer | Internet Archive | Search Engines: Academic Search Engines | Top News

Sponsored


About The Author: is a librarian, author, and an online information analyst based in suburban Washington, DC. He is the co-founder and co-editor of INFOdocket and FullTextReports.com and prior to that was founder/editor of ResourceShelf and DocuTicker for 10 years. He has worked for Blekko, Ask.com, and at Search Engine Watch where he was news editor. In 2001, Price was the co-author (with Chris Sherman) of the best-selling book The Invisible Web.

Connect with the author via: Email



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • Rank Watch

    This is a great news for the internet community as archive.org has the old to new cached copies of the pages of almost all the websites, which is the most powerful data on the web. If all the cached copies on the archive.org is made keyword searchable, certainly there is no doubt that marketers will be potentially having a great data to easily dig out from.

  • Austin Geraci

    The amount of data archive.org collects is really nothing short of amazing..

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide