DeepDyve Explores The Invisible Web


As web search engines have improved over the years, there’s been less attention paid to an “inconvenient truth” about the indexes of our favorite information finding tools—namely, that search engines still miss the lion’s share of information available on the web. This so-called “deep web” remains largely impenetrable to search engines for a variety of reasons, and for many types of queries that’s just fine. But if you’re a serious searcher, looking for the best information possible, you can’t afford to overlook this vast “hidden” store of information.

And that’s a challenge, because search tools that probe the deep web are for the most part either obscure or fee-based. That’s changing, thanks to a company formerly known as Infovell and now called DeepDyve. The eponymous DeepDyve.com rolls out today with an innovative approach to finding invisible web content that, despite limited coverage at the outset, impressed me with both what it finds and the tools it offers to make the searching experience even richer.

DeepDyve’s approach is like no other I’ve seen. Its chief scientists come from a background in genomics research, rather than computer science or linguistics. Genomics researchers strive to decode the information contained in DNA to understand the very building-blocks of life. Unlike search engineers who focus on text and keywords, genomics researchers look at a billion three letter “words” spelled out in the four letter alphabet of DNA. These words are combined in “sequences” that determine everything from hair color to whether we’re predisposed to a particular disease. To crack these codes requires massive amounts of data and the ability to see—and understand—hidden patterns of immense complexity.

DeepDyve takes a similar approach to understanding information on the web. Going far beyond basic keyword-based search, DeepDyve indexes every word in a document, but also computes the factorial combination of words and phrases in the document and uses some industrial strength statistical techniques to assess the “informational impact” of these combinations. In essence, this approach looks at the meaning of an entire document and uses that to compute relevance, rather than factors like snippets of text or anchor text in links pointing to documents.

It’s an interesting approach, and one that makes it easy to refine searches in a powerful way quickly and easily. “We think that search is going away from keywords toward where content is your query,” said William Park, DeepDyve’s CEO.

Today’s launch is relatively modest, with DeepDyve currently allowing searches in the areas of life sciences, patents and Wikipedia—about 500 million pages of deep web content (and arguably, Wikipedia isn’t really part of the deep web given its prominence in many Google, Microsoft and Yahoo search results, but that’s a minor quibble). Park says that the company is working hard to expand its coverage, adding physical sciences content in the areas of information technology, clean technology and energy, doubling DeepDyve’s index by year end.

The company also offers a premium version for $45 per month, with some nifty features like a “more like this” button that uses the full-text of a document as a query, with some pretty impressive results.

DeepDyve isn’t a threat to Google now or likely any time in the future. Instead, it’s a great tool for serious searchers wanting to do comprehensive research in the content areas that DeepDyve covers (it’s also, much like Powerset, a vastly more powerful way to search Wikipedia). DeepDyve also offers a genuinely different “second opinion” of the web if you’re wanting to look beyond the top results returned by Google and the other major search engines.

With its limited initial offering, DeepDyve has just scratched the surface of what’s available on the invisible web, albeit in a very useful way. However, truly cracking the invisible web problem still seems like a distant dream.



Chris Sherman is Executive Editor of SearchEngineLand.com and President of Searchwise LLC, a Boulder Colorado based Web consulting firm. He also programs and co-chairs the Search Marketing Expo - SMX conference series.

See more articles by Chris Sherman >


Share, Bookmark & Discuss This Article
More:


Keep Updated: News Via Email | News Via RSS Feed | News Via Twitter


See more stories like this in the Members Library! Check out the Features: General, Search Engines: Other Search Engines, Search Engines: Wikipedia, Top News sections of the Members Library where this story is filed. Members also get access to exclusive video content, a members-only weekly & monthly newsletter, plus more. Check out all the benefits!

Comments are closed.


RECENT COMMENTS

  • pete142 said " Very good bartender analogy gives the main point of the article instant clarity. Well done! I am env"
  • webmarketingadvisor said " It would be interesting to know how the decisions about the logos are made - I've been keeping a blo"
  • solarian said " Your article would be more consistent if you provide some links to not optimized for search engines "

See All »


FREE DAILY SEARCH NEWS RECAP!

Stay on top of all the search news with our daily summary, the SearchCap newsletter. View a sample ›

STAY CURRENT THROUGHOUT THE DAY

RSS Feeds

The Search Engine Land feed keeps you informed as news happens. SEE ALL FEEDS »

Upcoming Search Engine Land Conferences

Advertise With Us »

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.


SMX Web Site » | SMX Difference » | SMX News »


Join us at an upcoming SMX event:

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:


See more webcast topics »

TRACK US SOCIALLY
Upcoming Search Engine Land Conferences

Get Your Search Engine Land
Premium Membership!

Become a premium member today and receive:

  • Express commenting privileges & photo.
  • Exclusive videos & newsletters.
  • Discounts to our SMX conferences.
  • Access to "How To" & Other Archives.

Learn More

Upcoming Search Engine Land Conferences
Add to GoogleAdd to My Yahoo!Add to BloglinesAdd to NetvibesAdd to Windows Live