The Wayback Machine Gets Grant To Rebuild & Add Keyword Search

The Internet Archive's Wayback Machine will be rebuilding the service from the ground up.

Chat with SearchBot

future-search-box-ss-1920

The Internet Archive announced they received a substantial grant from the Laura and John Arnold Foundation to completely rebuild the code-base for the Wayback Machine. With that, the Wayback Machine will add basic keyword search so you don’t have to enter the URL of the site, but rather keywords, to find a relevant site.

The Wayback Machine lets you go back in time and see how sites have changed. It launched on January 24, 1996, about 19 years ago, and needs an update. They say they have currently 439+ billion Web captures over the 19 years.

Part of this includes “rewriting the Wayback Machine code,” says Wendy Hanamura, Director of Partnerships at The Internet Archive, adding, “this will enable us to improve reliability and functionality.”

They will also, for the first time, let users search by keyword to find sites. Hanamura wrote, “while indexing all of the pages in the Wayback Machine is beyond what we can do, we will index home pages of websites so that patrons won’t have to enter specific URLs to dive into the Wayback Machine.”

They will also optimize how deep they crawl and the quality of pages they crawl. Currently, they capture about one billion pages per week, but they should be able to do way more than that with the new code base. They will also improve how they see and index playback of media-rich and interactive websites, while continuing to support the old formats.

This doesn’t come with a facelift aimed at making the user interface easier.

For more details, see the blog post at the Internet Archive blog.


About the author

Barry Schwartz
Staff
Barry Schwartz is a Contributing Editor to Search Engine Land and a member of the programming team for SMX events. He owns RustyBrick, a NY based web consulting firm. He also runs Search Engine Roundtable, a popular search blog on very advanced SEM topics. Barry can be followed on Twitter here.

Get the must-read newsletter for search marketers.