Teaching Google To See Images

“Image search” is really something of a misnomer, because current generation search engines rely primarily on text to “understand” all types of content, including images. When you search for images on Google, Flickr or most other search engines, they aren’t examining the pixels that make up images. Instead, search engines look for clues that might […]

Chat with SearchBot

“Image search” is really something of a misnomer, because current generation search engines rely primarily on text to “understand” all types of content, including images. When you search for images on Google, Flickr or most other search engines, they aren’t examining the pixels that make up images. Instead, search engines look for clues that might identify relevant images—clues like descriptive filenames, tags, text near an image (think captions) and even the anchor text of links that point directly at image files.

Search engines take this awkward approach because it’s much more difficult to actually analyze the shapes, colors, lines and other objects that make up a composite that our eyes effortlessly assemble into a meaningful picture. Image analysis is algorithmically challenging, and computationally intensive to boot. Image processing and recognition on a large scale has until recently been beyond the capabilities of most computer scientists.

That’s starting to change, however. Recent work by computer scientists at UC San Diego and tested at Google’s data centers has brought the goal of true image analysis and search closer to realization. The senior researcher and author of a recent IEEE paper describing this work believes that these new approaches will ultimately get incorporated into the search engines we use on a daily basis.


Nuno Vasconcelos, a professor of electrical engineering at the UCSD Jacobs School of Engineering, discusses the approach, called Supervised Multiclass Labeling (SML), in a recent news release from the school (hat tip to Threadwatch for the pointer). Though SML sounds like a mouthful of jargon, what it really amounts to is systematically training a computer to recognize statistically similar objects, and teaching it to differentiate them from other objects that have similar characteristics.

The computer then labels the objects it finds in images, allowing keyword-based searching. Yes, this is tagging—but tagging done by a computer, following some very sophisticated rules and using a controlled vocabulary rather than arbitrary, subjective terms. With a large enough set of training images, the system gets progressively better at identifying objects in images.

Professor Vasconcelos talks about the approach in this 5 minute windows media video and at greater length in this hour long Google Tech Talk video called Using Statistics to Search and Annotate Pictures.

Notably, even though the researchers say that the image indexing technique allows them to cover larger collections of images at a lower computational cost than was previously possible, “the current version would still choke on the Internet’s vast numbers of public images.” So don’t expect to find the system replacing current image search on the general purpose search engines any time soon.

Other image search projects

Here’s a quick look at some other image search projects that are attempting, each in differing ways, to “see” the contents of images.

SeeIT.com is a content-based image search engine, searching approximately 8.3 million images randomly spidered on the Internet. SeeIT allows you to search using the visual characteristics of an image—the images are not tagged or categorized in any way. You search by entering a keyword, and then refining your query by clicking the image that most closely resembles what you’re looking for. Clicking a “similar” link beneath a thumbnail of an image brings up another set of images all with similar visual characteristics.

SeeIT.com is in beta while the company is scaling the index from millions to hundreds of millions of images. You can try it by clicking here, then entering the user name picture and password picture93AE (exclusive access for Search Engine Land readers). See this information for new users for more information, including some of the limitations of the current beta release.

Tiltomo lets you search for “similar” images posted by Flickr users. “Similar” is defined as either a similar “theme” (subject, color or texture), or 100% similar color or texture. To try it out, search one of two test databases of about 130,000 images: Flickr catchy colors or Flickr general images.

eVision is a company that makes image search tools for enterprise applications, rather than web search, but it has several online demos available that show off its capabilities. eVision uses “segmentation,” dividing an image into regions, which correspond approximately to objects or parts of objects in an image (this is similar to the UCSD approach described above). Once these object regions are identified, the four basic properties of color, texture, shape, and object shading are extracted and stored in a condensed descriptor called a visual signature. Similarity comparisons are then made on the visual signatures of objects in other images.

Riya was at one point rumored to be a Google acquisition target thanks to its image search technology. Riya started out focusing primarily on facial recognition, but now has a beta visual search that lets you find similar faces and objects on many images across the web and then refine your results, using color, shape and texture.

Riya also powers the visually oriented product search service Like.com that lets you find clothing and a few home furnishing items based on visual similarity. Like also has a “celebrity” search that lets you see what the stars are currently wearing and find similar accoutrements for your own adornment.

The State Hermitage Museum in Russia is using IBM’s experimental Query By Image Content (QBIC). The museum offers two ways to find similar types of artwork: QBIC Color Search locates two-dimensional artwork in the museum’s digital collection that match the colors you specify. QBIC Layout Search lets you define geometric shapes or arrange areas of color on a virtual canvas to approximate the visual organization of the work of art for which you are searching. Read more about IBM’s QBIC technology here.

CIRES: Content Based Image REtrieval System is a research project he University of Texas at Austin that uses “a “combination of higher-level and lower-level vision principles” to understand the content of images (more information here; search CIRES here).

Last December, Danny wrote about Polar Rose, a company that was promising to help bring context to photos posted on the web. Polar Rose doesn’t yet have a demo, but you can see screen shots and get more information here.

Want to learn more about content-based image search? Check out this 26 page white paper from Microsoft Research, Fundamentals of Content-Based Image Retrieval (PDF).


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Chris Sherman
Contributor
Chris Sherman (@CJSherman) is a Founding editor of Search Engine Land and is now retired.

Get the must-read newsletter for search marketers.