This year, we’ve seen several ambitious new efforts to mine and make searchable the vast part of the web that remains largely hidden to search engines—often referred to as the deep, hidden or invisible web. The latest foray into the reaches of hidden web is actually a hybrid, of sorts: Factual, founded by Gil Elbaz, who previously built the foundational technology that enabled Google’s leap to world domination in search (Elbaz sold his company Applied Semantics to Google in 2003; it was the genesis of what we know as AdSense today).
Factual is a self-described “open data repository.” Like Wolfram Alpha, a “computational knowledge engine” that launched earlier this year, Factual seeks to create order from chaos by allowing anyone to share and mash open data on any subject, structuring information in database-like tables.
Unlike Wolfram Alpha, which is a closed system with data “curated” by employees, Factual has adopted a Wikipedia-like model which allows anyone to create, structure or even edit data in Factual tables. Elbaz hopes this open model will encourage community participation, enabling Factual to grow rapidly and enjoy widespread adoption.
“Our aspiration is to build the largest structured data repository,” says Elbaz. “We already have 100,000 tables created but we hope to capture the community’s interest.”
The trouble with the web
Why do we even need these new “non-search engines” like Factual and Wolfram Alpha? What do they offer that Google doesn’t?
Google is very good at two things: helping us navigate to a web site that we can then explore to satisfy a need, and at surfacing basic, relatively simple facts. What Google—and every other traditional search engine, for that matter—has trouble with is finding or manipulating larger collections of data, or responding intelligently to more complex queries.
Why? Largely because the web isn’t like a database—it’s highly unstructured and there are few standards for how information should be organized for optimum efficiency. Google’s huge challenge lies in making sense out of this chaotic disorder. It does this by literally downloading as much of the web as it can, creating a massive index and then trying to find needles in its own haystack.
While this process works well, it also has limitations. For example, there’s a vast amount of information that’s web-accessible, but not directly a part of the web. Much of this information is contained in databases, and databases only divulge their information when queried—when people type words into their forms, tick check boxes or use drop-down menus to limit or filter results. As good as Google is at understanding open web content, it struggles to access this “deep” or “invisible” web content, largely because Google can’t easily interact with the user interfaces of databases. (Google is experimenting with structured data, most notably with Google Squared and Fusion Tables, but these are Google Labs projects and don’t appear to be major initiatives for mainstream search.)
Factual takes a different approach. Like Wolfram Alpha, Factual’s staff works to upload facts from publicly available information sources. Like Google, Factual also mines the web for data, but selectively, rather than comprehensively. Factual also accepts data submissions from users and developers.
And like Wikipedia, Factual allows anyone to edit its data.
Wait! How can “facts” be “editable?”
Wikipedia has evolved, over time, into an authoritative reference that’s on a par with some of the most trustworthy paid sources of information. But since anyone can edit Wikipedia articles, the site has drawn well-deserved criticism when some users altered or distorted facts, used articles to spin reputations or otherwise corrupted the integrity of Wikipedia content.
Factual is allowing registered users to edit data, but rather than adopting Wikipedia’s “edit and replace” model, Factual lets people add information to a table without overwriting or deleting existing data. It then uses a consensus-based model to settle on the most authoritative facts to display. Elbaz says this makes it difficult to impossible for a charlatan to corrupt data with incorrect entries. Elbaz wrote on a blog post today:
“Factual wants to bring true accountability to data. Accountability means anyone can easily contribute their opinion, substantiate the data or disagree with the data. Accountability means full transparency and history in regards to how this data was originated (e.g. citations or explanations).”
Although Elbaz told me “we’re not a search engine—the goal isn’t to be a search engine,” searching is still one of the best ways to get started with Factual. You’ll find a search box at the upper right of every Factual page, and you can use it just like you would Google.
Search results, however, are very different. Because Factual is searching data, you’re presented with the name of data tables, along with the fields in the table. You can sort results by relevance, table name, last updated, author, views, rows or user rating. Once you click through on a result, the table is displayed very much like an Excel spreadsheet.
For example, look at this List of Nobel Peace Prize Laureates. Compare it to the Wikipedia article from which the data was extracted to see how Factual has constructed the table and allows you to manipulate it in various ways.
Want to use this table—or a table of your own—on your own website? No problem—Factual makes it easy to embed a table into your own site or blog (here’s a three minute video that shows you how).
And if you have the technical chops, you’ll love Factual’s developer tools that let you create mashups from existing data sources on the web, combining different facts into interesting searchable sites, like this search app that helps you sort through restaurants in Los Angeles based on multiple attributes.
Factual is one of a new breed of search tools that are partially solving the invisible web problem. It isn’t a replacement for Google, but rather a really interesting, useful place to turn when Google just isn’t doing it for you. And I expect as Factual grows and matures, it’s going to become ever more useful.
At Applied Semantics, Elbaz and team built the world’s largest database of words and meanings, which was subsequently folded into Google. With Factual, the goals are much bigger and more ambitious. Elbaz: “Why should I limit myself to words and meanings? Why not go for all human knowledge?”