Search Engine Land
  • SEO
    • > All SEO
    • > What Is SEO?
    • > SEO Periodic Table
    • > Google: SEO
    • > Bing SEO
    • > Google Algorithm Updates
  • PPC
    • > All PPC
    • > What is PPC?
    • > Google Ads
    • > Microsoft Ads
    • > The Periodic Tables of PPC
  • Focuses
    • > Local
    • > Commerce
    • > Shopify SEO Guide
    • > Content
    • > Email Marketing Periodic Table
    • > Social Media Marketing
    • > Analytics
    • > Search Engine Land Awards
    • > All Focuses
  • SMX
  • Webinars
  • Intelligence Reports
  • White Papers
  • About
    • > About Search Engine Land
    • > Newsletter
    • > Third Door Media
    • > Advertise

Processing...Please wait.

Search Engine Land » Channel » SEO » Search Wikia Takes Steps To Crawl; Acquires Grub

Search Wikia Takes Steps To Crawl; Acquires Grub

Wikia, Inc., the for-profit company developing the open source search engine Search Wikia, has acquired Grub, a distributed crawler platform, from LookSmart. Distributed crawler? Crawlers are software programs used by search engines to roam the web to discover pages that are then downloaded and indexed for searching. The crawlers operated by the major search engines […]

Chris Sherman on July 27, 2007 at 12:30 pm

Wikia, Inc., the for-profit company developing the open source search engine Search Wikia, has acquired Grub, a distributed crawler platform, from LookSmart.

Distributed crawler? Crawlers are software programs used by search engines to roam the web to discover pages that are then downloaded and indexed for searching. The crawlers operated by the major search engines are highly centralized, operating out of massive data centers, and are capable of finding and downloading millions of pages per minute.

Grub, by contrast, taps into the spare power of thousands of personal computers connected to the internet. Volunteers download the Grub client, and then allow it to operate as a background process on their computers—even as a screensaver. While each individual Grub client is a mite compared to a search engine crawler, the collective power of thousands of Grub clients working in tandem can be impressive. “We’re hoping to get lots of people involved to help us crawl the web,” said Jimmy Wales, co-founder and chairman, Wikia, Inc.


Wikia acquired Grub as part of its plan to build a “transparent and open platform for search,” according to Wales. Wikia is has transformed Grub into an open source project, allowing developers to add or extend the functionality of the software. Wales called it “the next step” in Wikia’s efforts to build a better search engine. Danny did an interesting Q&A With Jimmy Wales On Search Wikia where they discussed the details of that project.

I played around with Grub for a few weeks back in 2003, right after LookSmart acquired the technology. It’s fascinating to watch a crawler in action, fetching page after page from all over the web. It’s also an eye-opener, revealing the amazing variety of content on the web, from great sites you’ve never heard of to obviously spammy garbage.

I asked Wales about the spam problem, and he said that it wasn’t a concern at this point. To maximize efficiency and eliminate redundancy, there’s a “master crawl” list that’s broken up into small chunks that are sent to each client. Once a client has crawled a group of URLs, it gets another list, and so on. Wales said that at least initially, that master list would be made up of well-known, “whitelisted” sites.

Want to help Wikia crawl the web? Download the Grub client. You can run it either as your default screensaver, or as a background process. Currently available for Windows only—from the download page: “Ozra is pretty sure the Linux client isn’t going to work at all. We need to get it ported to Linux as soon as possible…”

Postscript: I’m not sure the Windows client is working either. It took me three tries to download—the first two attempts only downloaded fragments of the program that Windows couldn’t figure out how to open. Then, the client kept attempting to contact the host server to get an initial list of URLs to crawl, only to fail repeatedly. Let’s hope the open source community is motivated to start work asap!

grub-error.jpg


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


New on Search Engine Land

    New mobile Google ad experiment puts favicon in-line with display URL

    Google launches video health tools to help publisher monetization

    SEO pioneer and expert Bill Slawski passes away

    New Yelp feature: Request a Call

    Google Ads issue with access to video pages frontend

About The Author

Chris Sherman
Chris Sherman (@CJSherman) is a Founding editor of Search Engine Land and is now retired.

Related Topics

SEOSocial

Get the daily newsletter search marketers rely on.

Processing...Please wait.

See terms.

ATTEND OUR EVENTS

Learn actionable search marketing tactics that can help you drive more traffic, leads, and revenue.

March 8-9, 2022: Master Classes (virtual)

June 14-15, 2022: SMX Advanced (virtual)

November 15-16, 2022: SMX Next (virtual)

Learn More About Our SMX Events

Discover time-saving technologies and actionable tactics that can help you overcome crucial marketing challenges.

Start Discovering Now: Spring (virtual)

September 28-29, 2022: Fall (virtual)

Learn More About Our MarTech Events

Webinars

Take a Crawl, Walk, Run Approach to Multi-Channel ABM

Content Comes First: Transform Your Operations With DAM

Dominate Your Competition with Google Auction Insights and Search Intelligence

See More Webinars

Intelligence Reports

Enterprise SEO Platforms: A Marketer’s Guide

Enterprise Identity Resolution Platforms

Email Marketing Platforms: A Marketer’s Guide

Enterprise Sales Enablement Platforms: A Marketer’s Guide

Enterprise Digital Experience Platforms: A Marketer’s Guide

Enterprise Call Analytics Platforms: A Marketer’s Guide

See More Intelligence Reports

White Papers

Reputation Management For Healthcare Organizations

Unlock the App Marketing Potential of QR Codes

Realising the power of virtual events for demand generation

The Progressive Marketer’s Ultimate Events Strategy 2022 Worksheet

CMO Guide: How to Plan Smart and Pivot Fast

See More Whitepapers

Receive daily search news and analysis.

Processing...Please wait.

Topics

  • SEO
  • PPC

Our Events

  • Search Marketing Expo - SMX
  • MarTech

About

  • About Us
  • Contact
  • Privacy
  • Marketing Opportunities
  • Staff

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • Newsletters
  • RSS
  • Youtube

© 2022 Third Door Media, Inc. All rights reserved.