Wayback Machine Adds 160 Billion Indexed Pages In A Year, Surpasses 400 Billion Indexed Pages

The Internet Archive announced that the Wayback Machine, a huge internet archive of web pages dating back to 1996, has surpassed 400 billion pages indexed. In January 2013, a little over a year ago, the Wayback Machine said they had 240 billion URLs indexed and since then, they have added another 160 billion URLs! That […]

Chat with SearchBot

The Internet Archive announced that the Wayback Machine, a huge internet archive of web pages dating back to 1996, has surpassed 400 billion pages indexed.

In January 2013, a little over a year ago, the Wayback Machine said they had 240 billion URLs indexed and since then, they have added another 160 billion URLs! That brings up the indexed page count by the Wayback Machine to over 400 billion URLs.

On Friday, the Internet Archive announced this on their blog and said the indexed pages date back from late 1996 up until a few hours ago. Then they shared some of their history:

  • 2001 – The Wayback Machine is launched. Woo hoo.
  • 2006 – Archive-It is launched, allowing libraries that subscribe to the service to create curated collections of valuable web content.
  • March 25, 2009 – The Internet Archive and Sun Microsystems launch a new datacenter that stores the whole web archive and serves the Wayback Machine. This 3 Petabyte data center handled 500 requests per second from its home in a shipping container.
  • June 15th, 2011 – The HTTP Archive becomes part of the Internet Archive, adding data about the performance of websites to our collection of web site content.
  • May 28, 2012 – The Wayback Machine is available in China again, after being blocked for a few years without notice.
  • October 26, 2012 – Internet Archive makes 80 terabytes of archived web crawl data from 2011 available for researchers, to explore how others might be able to interact with or learn from this content.
  • October 2013 – New features for the Wayback Machine are launched, including the ability to see newly crawled content an hour after we get it, a “Save Page” feature so that anyone can archive a page on demand, and an effort to fix broken links on the web starting with WordPress.com and Wikipedia.org.
  • Also in October 2013 – The Wayback Machine provides access to important Federal Government sites that go dark during the Federal Government Shutdown.

About the author

Barry Schwartz
Staff
Barry Schwartz is a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics. Barry can be followed on Twitter here.

Get the must-read newsletter for search marketers.