Subscribe Via Web Feed Subscribe with Google Add to My Yahoo! Subscribe with Bloglines Add to netvibes Subscribe with Live.com

« Last Call For SMX London Session Ideas | Main | Yahoo Takes One More Step Away From Competing In Paid Search »

Jul. 24, 2008 at 8:05am Eastern by Tom Wilde

A Visual Dictionary For The Web

Video Search - A Column From Search Engine Land One of the most popular vertical search features on the web is image search index. What’s really remarkable, however, is how little has changed in the core technology approach to the indexing of multimedia over the last decade. When I was the head of product at FAST back in 1999, we launched the web’s biggest image search on Lycos with over 50 million images (which seemed like a lot at the time!). The service included many leading edge features including black and white and color image filtering, size filtering, and filetype filtering. The main differentiator today continues to be index size and freshness, and companies with the strongest technology in web content discovery have the most advantage. Not surprisingly, Google leads here, as their web crawling capability is far superior to anyone’s on the web. What is surprising, however, is how little has changed over the last decade. Audio and video isn’t significantly different in this regard. The majority of multimedia indexing today relies on the classic “titles and tags” approach, and Google Video is perhaps Google’s most underwhelming search product because of this limitation.

While text-based keyword search continues to dominate web navigation, one can see a future where the search input has multiple formats. I’ve seen many demos where you can provide a particular image of a mountain scene and get back remarkable similar images. The same is true of video. The problems with these approaches today are twofold. First, image and video processing is still incredibly resource intensive, although with Moore’s Law and cloud computing capabilities this problem seems only temporary. The bigger challenge is the lack of a “visual dictionary” on the web.

What is the visual dictionary?

The state of the art in multimedia processing today, specifically around images, is to use a pixel mapping process to find similar images visually. The challenge with this approach is that the pixels still don’t convey the “aboutness” of the image it is processing. Said another way, the computer knows it looks like a mountain based on the original image provided, but can’t tell the user it’s a mountain scene. Facial recognition has a similar problem. Facial recognition can find a similar face to one provided, and has improved to the point where it can actually find the same face rather than a similar face. But again, it doesn’t know the name of the person it has discovered. What’s needed is an approach I’ll call the visual dictionary. The visual dictionary would be a master meta data collection that would have tagged all of the pixel representations of an object for its “aboutness." This would enable many exciting possibilities:

  • Automatic tagging of new images: The moment an image is loaded to the web, it would have a set of “best fit” tags from the visual dictionary that would describe it.
  • Finding similar: The visual dictionary would aid in the discovery of similar images online, either from a user presenting a keyword or an image as the “query.”
  • Classification: Multimedia files could be automatically dropped into taxonomies and ontologies, which for the most part rely on text-based Boolean rules.

Google has recognized this problem and has put forward a human generated approach, similar to Amazon’s Mechanical Turk. With this approach, two anonymous people are paired together to “tag” an image. The pairing helps cut down on spam and maximizes tag coverage. While this approach is likely to yield high quality results, human tagged approaches have not scaled particularly well in search environments. The challenge with video is even greater. At 30 frames per second, a three minute clip creates 5400 “images!" In the case of a video news clip, the three minutes likely cover several entirely distinct concepts, people, and places, and therefore require granular tagging to really provide robust indexing.

The combination of computing power and natural language processing seem poised to create real innovation in an area that has seen little over the last decade.

Tom Wilde is the CEO of EveryZing, a Cambridge-based company specializing in next-generation Universal Search and video search engine optimization (video SEO). The Video Search column appears on Thursdays at Search Engine Land.

Like The Story? Vote For It On Yahoo Buzz!
Subscribe To Our Daily Search News Recap!
Your Email:
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds)
Subscribe To Our Search Feed!
Subscribe Via Web FeedSubscribe with GoogleAdd to My Yahoo!Subscribe with BloglinesAdd to netvibes
Subscribe with Live.comSubscribe in NewsGator OnlineSubscribe in RojoAdd to My AOL
Share & Bookmark This Story!
By Tom Wilde Permalink Jump To Comments See Related Stories In: Video Search



Reader Comments

Search:

Search Marketing Expo

Save the date for:
SMX China (Nanjing) - Sept. 23-24
SMX Stockholm - Sept. 23-24: See who's speaking or register now.
SMX East (New York City) - Oct. 6-8: See the agenda or register today and save!
SMX London - Nov. 4-5: Pre-agenda rate now available. Click here.

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts:

Most Recent News Posts

About Search Engine Land

Stay Updated!

Get Our Search Newsletters:
Email:
Daily Monthly

Get Our Search Feed:
Subscribe Via Web FeedSubscribe with Google
Add to My Yahoo!Subscribe with Bloglines
Add to netvibesSubscribe with Live.com
Subscribe in NewsGator OnlineSubscribe in Rojo
Add to My AOL
More About Our Feeds & Newsletters

Add to Technorati Favorites

Track Us Socially:
Facebook: Our Search News App
Facebook: Search Engine Land Page
Facebook: Search Engine Land Group
Flickr: Search Engine Land
LinkedIn: Search Engine Land Group
Twitter: Search Engine Land Feed

Bragroll