A few days ago, I was looking for a very specific piece of data: the average age of professors in the US. I did a Google search for “average age of us professors” and, after the obligatory and unhelpful Wikipedia result at the top, found some data that was good enough further down the search results.
Google (and other search engines) does pretty well with some data-based searches, especially since many of us have trained ourselves how to phrase a query to get the info we want.
But what about when the data we want isn’t found in text, but is likely to be found in graphs, charts and tables filled with numbers?
Enter Zanran, a search engine built to uncover “semi-structured” data on the web:
This is the numerical data that people have presented as graphs and tables and charts. For example, the data could be a graph in a PDF report, or a table in an Excel spreadsheet, or a barchart shown as an image in an HTML page. This huge amount of information can be difficult to find using conventional search engines, which are focused primarily on finding text rather than graphs, tables and bar charts.
Or, as Zanran’s about us page declares, “Zanran is Google for data.”
That’s a pretty tall claim, especially for what co-founders Jon Goldhill and Yves Dassas describe as an “early beta” product. It begs an obvious question:
Is Zanran Any Good?
Short answer: For some queries, yes, Zanran is quite good. Almost scarily so, actually. But for other queries, it doesn’t yet measure up.
“As a general rule,” Goldhill says, “results are good when the data is likely to be found in graphs and tables rather than in free text, is the subject of a full analysis [and] serves information professionals (analysts, consultants, librarians, etc.).”
One great example of Zanran’s capabilities is a search for average commute time.
The results are filled with PDFs that have matching data, and the first result on the page is data from the US Labor Department comparing commute times in 2000 versus 2007. My query didn’t include any geographical preference, and this top result focuses on the state of South Carolina — but it also has a graphic that compares South Caroline commute times to the US as a whole.
And Zanran has a nice feature that makes accessing the data faster: Putting your mouse over the icon at left shows a screenshot of what Zanran found to match your query. Here’s that US Labor Dept. page:
As I said, it’s mostly about South Carolina, but notice that the table at the bottom has US data, too.
Another strong result comes on a query like Facebook growth europe, which is obviously something that’s quantifiable via charts and graphs. Zanran’s results include a variety of PDFs and web pages with the data I’m looking for:
One of the queries that doesn’t currently do well is the one I used at the beginning of this article. Searching Zanran for average age of us professors returns a lot of graphs and charts related to “professors” and “average,” but only a couple specifically offer age-related data.
Zanran offers a page showing examples of queries that tend to produce better results.
I should also mention that, in my initial testing, I didn’t use any of Zanran’s advanced search options. Those include the ability to search a specific site, for specific file types, over certain time periods (last 6 months, last 12 months, etc.) and for data from specific countries. The latter is available for certain English-language countries only.
Goldhill says that Zanran has already received good feedback “from information professionals who spend much of their day looking for serious information,” but he recognizes there’s room for improvement.
Right now, Zanran consists of Goldhill, Dassas, and a team of freelance programmers. They have what Goldhill calls a “friends and family” round of funding that will keep them going for at least another year. And there are plans to eventually monetize the search engine: Goldhill says they’ll place ads for relevant industry reports (think Gartner, Forrester, etc.) next to search results so that searchers can see what data is available for free and what’s available at a cost.
The jury’s still out on whether Zanran actually becomes a “Google for data,” but I’d say it’s off to a good start for now and there’s clearly an opportunity to develop a unique and useful niche search engine here.