Search Engine Land » SEO » Content » Wayback Machine Now Has 240 Billion URLs

Wayback Machine Now Has 240 Billion URLs

The Wayback Machine from the Internet Archive, one of the most useful and important Internet research tools, recently reached a major milestone. In a blog post, archive founder Brewster Kahle announced that The Wayback Machine now provides access an index containing more than 240 billion URLs (about five petabytes of data), with archived pages dating […]

Gary Price on January 14, 2013 at 7:01 am | Reading time: 2 minutes

Chat with SearchBot

The Wayback Machine from the Internet Archive, one of the most useful and important Internet research tools, recently reached a major milestone.

In a blog post, archive founder Brewster Kahle announced that The Wayback Machine now provides access an index containing more than 240 billion URLs (about five petabytes of data), with archived pages dating back to 1996.

The amount of newly accessible archived material is huge. Prior to this update, The Wayback Machine provided access to about 150 billion URLs.

Researchers should note that a small amount of the index available in the prior release is temporarily unavailable via the new and larger index. So, the older index remains available using a different interface.

Along with announcing the new release Kahle said the database can potentially access archived pages that were online as recently as early December 2012.

This is also exciting news for researchers since the lag time between the time a page was crawled and indexed and then became accessible via The Wayback Machine could often be six months or longer.

Kahle also mentions that Wayback is now handling more than 1,000 queries per second by more than 500,000 people a day.

All of this news follows a New Year’s Eve blog post by Brewster Kahle announcing that the Internet Archive had just completed raising one million dollars that will allow the organization to purchase four more petabytes of storage. The fundraising continues because the archive estimates they’ll need more than ten petabytes of storage during 2013.

Archive It

Although web pages indexed by The Wayback Machine are NOT keyword searchable more than a thousand collections of archived web pages focusing on wide variety of topics ARE keyword searchable.

These archives are made available by Archive-It, a fee-based service, that’s part of The Internet Archive. These collections are built targeting specific urls to crawl, index, and archive.

Archive-It works with the education community (K-12 and higher ed), libraries, government agencies, non-profits, and others. Many of these groups make their collections accessible to all users.

Here are a Few Examples of Archive-It Collections:

Here’s a directory with information and links to more than 1800 Archive-It collections that are keyword searchable.

Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.

Add Search Engine Land to your Google News feed.

Related stories

New on Search Engine Land

TikTok SEO: The ultimate guide

6 vital lenses for effective keyword research

How to adapt your marketing for the new era of data analytics

Meta AI adds Google Search results

Google Chrome IP masking could radically impact search advertising

About the author

Contributor

Gary Price

Gary Price is a librarian, author, and an online information analyst based in suburban Washington, DC. He is the co-founder and co-editor of INFOdocket and FullTextReports.com and prior to that was founder/editor of ResourceShelf and DocuTicker for 10 years. He has worked for Blekko, Ask.com, and at Search Engine Watch where he was news editor. In 2001, Price was the co-author (with Chris Sherman) of the best-selling book The Invisible Web.