Subscribe Via Web Feed Subscribe with Google Add to My Yahoo! Subscribe with Bloglines Add to netvibes Subscribe with Live.com

« Google AdWords Goes Yellow & Changes On Click Event | Main | Google: Master Of Closing The Loop? »

Apr. 5, 2007 at 5:22pm Eastern by Chris Sherman

Teaching Google To See Images

"Image search" is really something of a misnomer, because current generation search engines rely primarily on text to "understand" all types of content, including images. When you search for images on Google, Flickr or most other search engines, they aren't examining the pixels that make up images. Instead, search engines look for clues that might identify relevant images—clues like descriptive filenames, tags, text near an image (think captions) and even the anchor text of links that point directly at image files.

Search engines take this awkward approach because it's much more difficult to actually analyze the shapes, colors, lines and other objects that make up a composite that our eyes effortlessly assemble into a meaningful picture. Image analysis is algorithmically challenging, and computationally intensive to boot. Image processing and recognition on a large scale has until recently been beyond the capabilities of most computer scientists.

That's starting to change, however. Recent work by computer scientists at UC San Diego and tested at Google's data centers has brought the goal of true image analysis and search closer to realization. The senior researcher and author of a recent IEEE paper describing this work believes that these new approaches will ultimately get incorporated into the search engines we use on a daily basis.

Nuno Vasconcelos, a professor of electrical engineering at the UCSD Jacobs School of Engineering, discusses the approach, called Supervised Multiclass Labeling (SML), in a recent news release from the school (hat tip to Threadwatch for the pointer). Though SML sounds like a mouthful of jargon, what it really amounts to is systematically training a computer to recognize statistically similar objects, and teaching it to differentiate them from other objects that have similar characteristics.

The computer then labels the objects it finds in images, allowing keyword-based searching. Yes, this is tagging—but tagging done by a computer, following some very sophisticated rules and using a controlled vocabulary rather than arbitrary, subjective terms. With a large enough set of training images, the system gets progressively better at identifying objects in images.

Professor Vasconcelos talks about the approach in this 5 minute windows media video and at greater length in this hour long Google Tech Talk video called Using Statistics to Search and Annotate Pictures.

Notably, even though the researchers say that the image indexing technique allows them to cover larger collections of images at a lower computational cost than was previously possible, "the current version would still choke on the Internet’s vast numbers of public images." So don't expect to find the system replacing current image search on the general purpose search engines any time soon.

Other image search projects

Here's a quick look at some other image search projects that are attempting, each in differing ways, to "see" the contents of images.

SeeIT.com is a content-based image search engine, searching approximately 8.3 million images randomly spidered on the Internet. SeeIT allows you to search using the visual characteristics of an image—the images are not tagged or categorized in any way. You search by entering a keyword, and then refining your query by clicking the image that most closely resembles what you're looking for. Clicking a "similar" link beneath a thumbnail of an image brings up another set of images all with similar visual characteristics.

SeeIT.com is in beta while the company is scaling the index from millions to hundreds of millions of images. You can try it by clicking here, then entering the user name picture and password picture93AE (exclusive access for Search Engine Land readers). See this information for new users for more information, including some of the limitations of the current beta release.

Tiltomo lets you search for "similar" images posted by Flickr users. "Similar" is defined as either a similar "theme" (subject, color or texture), or 100% similar color or texture. To try it out, search one of two test databases of about 130,000 images: Flickr catchy colors or Flickr general images.

eVision is a company that makes image search tools for enterprise applications, rather than web search, but it has several online demos available that show off its capabilities. eVision uses "segmentation," dividing an image into regions, which correspond approximately to objects or parts of objects in an image (this is similar to the UCSD approach described above). Once these object regions are identified, the four basic properties of color, texture, shape, and object shading are extracted and stored in a condensed descriptor called a visual signature. Similarity comparisons are then made on the visual signatures of objects in other images.

Riya was at one point rumored to be a Google acquisition target thanks to its image search technology. Riya started out focusing primarily on facial recognition, but now has a beta visual search that lets you find similar faces and objects on many images across the web and then refine your results, using color, shape and texture.

Riya also powers the visually oriented product search service Like.com that lets you find clothing and a few home furnishing items based on visual similarity. Like also has a "celebrity" search that lets you see what the stars are currently wearing and find similar accoutrements for your own adornment.

The State Hermitage Museum in Russia is using IBM's experimental Query By Image Content (QBIC). The museum offers two ways to find similar types of artwork: QBIC Color Search locates two-dimensional artwork in the museum's digital collection that match the colors you specify. QBIC Layout Search lets you define geometric shapes or arrange areas of color on a virtual canvas to approximate the visual organization of the work of art for which you are searching. Read more about IBM's QBIC technology here.

CIRES: Content Based Image REtrieval System is a research project he University of Texas at Austin that uses "a "combination of higher-level and lower-level vision principles" to understand the content of images (more information here; search CIRES here).

Last December, Danny wrote about Polar Rose, a company that was promising to help bring context to photos posted on the web. Polar Rose doesn't yet have a demo, but you can see screen shots and get more information here.

Want to learn more about content-based image search? Check out this 26 page white paper from Microsoft Research, Fundamentals of Content-Based Image Retrieval (PDF).

Like The Story? Vote For It On Yahoo Buzz!
Subscribe To Our Daily Search News Recap!
Your Email:
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds)
Subscribe To Our Search Feed!
Subscribe Via Web FeedSubscribe with GoogleAdd to My Yahoo!Subscribe with BloglinesAdd to netvibes
Subscribe with Live.comSubscribe in NewsGator OnlineSubscribe in RojoAdd to My AOL
Share & Bookmark This Story!
By Chris Sherman Permalink Jump To Comments See Related Stories In: Google: Images, Search Engines: Photo & Image Search



Reader Comments

One aspect worth mentioning would be that Google's Image Labeler program could easily be used to provide the data necessary to train the system in this Supervised Multiclass Labeling method.

While SML couldn't really be used for huge numbers of images in real-time for searching, it could be used to process images over time in order to build up the metadata to associate with the images -- providing a rich new source of keyword content.

So, the potential for this to begin functioning for Google's Image Search is actually quite high.

Chris:
Here are a few more projects:

+ LTU Tech
http://www.ltutech.com/en/

+ Image-Seek Database of Corbis Royalty Free Imagery Using Both Metadata and CBIR
http://corbis.ltutech.com/

Also, LTU presentations from the 2003 and 2004 Search Engine Meeting provide excellent intros to CBIR:
+ Organising personal pictures with content analysis technology
From the 2004 meeting. PDF file.
http://www.infonortics.com/searchengines/sh04/slides/ltu.pdf
+ Finding the Right Image in the Corbis Collection
From the 2003 meeting. PDF file.
http://www.infonortics.com/searchengines/sh03/slides/nastar-2003.pdf

+ xcavator from Cognisign
http://www.cognisign.com/
http://www.xcavator.net/

+ Freenet.de (in German, use your fave mechanical translation software)
http://www.freenet.de/suche/index.html?search_type=picture&adv=1
The first options allows you to limit to images with specific words "in the image." The third, to images with a human face.

+ Photo2Search from Microsoft
Cameraphone searching and content image retrieval
http://preview.tinyurl.com/qcqtc

+ Vima Technologies
http://www.vimatech.com/
Demos here
http://www.mediabakery.com/
Look for the "eye" icon below image.
Also, here:
http://www.danitadelimont.com/search.asp

Cydral (Coming Soon?)
http://www.cydral.com/

I've used Like.com (by Riya) and I find it to be pretty decent. I think it has huge potential for e-tailers.

One more company that I was just reminded of in this space is piXlogic. They are also doing work in video image recognition.

More in this 2003 article:
CIA funds development of photo-checking software
http://www.theage.com.au/articles/2003/06/05/1054700310133.html

and this 2006 article.
http://www.directionsmag.com/article.php?article_id=2331&trv=1

Direct to piXlogic
http://www.pixlogic.com/

Search:

Search Marketing Expo

Save the date for:
SMX Local & Mobile - San Francisco, CA (July 24-25) See the agenda, and register now!
SMX Sao Paolo - Brazil - (Aug. 7-8)
SMX China - September 23 & 24
SMX Stockholm - September 23 & 24
SMX East - NYC - (Oct. 6-8) Registration is now open.
SMX London - November 4 & 5

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts:

Most Recent News Posts

About Search Engine Land

Stay Updated!

Get Our Search Newsletters:
Email:
Daily Monthly

Get Our Search Feed:
Subscribe Via Web FeedSubscribe with Google
Add to My Yahoo!Subscribe with Bloglines
Add to netvibesSubscribe with Live.com
Subscribe in NewsGator OnlineSubscribe in Rojo
Add to My AOL
More About Our Feeds & Newsletters

Add to Technorati Favorites

Track Us Socially:
Facebook: Our Search News App
Facebook: Search Engine Land Page
Facebook: Search Engine Land Group
Flickr: Search Engine Land
LinkedIn: Search Engine Land Group
Twitter: Search Engine Land Feed

Bragroll