Robots reading cereal boxes in the supermarket? Googlebot at the art museum? Street signs and building addresses snatched from Street View images for local search, image search, and product search?
Three new patent applications published at the U.S. Trademark and Patent Office this week explore the intricacies of reading text in images taken from Google’s Street View project and some interesting steps beyond those. I described a number of the implications behind the patent filings in an SEO by the Sea post from last night: Google on Reading Text in Images from Street Views, Store Shelves, and Museum Interiors.
Let’s take a slightly different look.
One of the most fun blog posts of last year was a spoof titled Google Interiors – the day my house became searchable. The satire seems to have come a little closer to reality, with the publication of these three patent filings.
The patent applications involved are:
The most sensational aspects of the documents come towards the end where we are told that robots might be used to take pictures of products on store shelves and in museums. A snippet from the filings:
In addition to street scenes, indexing can be applied to other image sets. In one implementation, a store (e.g., a grocery store or hardware store) is indexed. Images of items within the store are captured, for example, using a small motorized vehicle or robot. The aisles of the store are traversed and images of products are captured in a similar manner as discussed above. Additionally, as discussed above, location information is associated with each image. Text is extracted from the product images. In particular, extracted text can be filtered using a product name database in order to focus character recognition results on product names.
There’s a science fiction element to this world of robots running amuck in supermarkets, but there’s also a lot of science involved in the documents. The descriptions of how text might be taken from street view images describes a number of techniques that account for problems with images, such as those caused by low contrast from shadows and shading. The use of consecutive images from the Street View cameras can also enhance the reading of text that might be blurry or partially hidden from view in one or more shots.
Here’s a screenshot from the patent filings, which shows a number of places where text might be extracted from one image:
Some of the image techniques described in this document were first hinted at in the patent applications behind Google’s Book project, which I wrote about in the summer of 2006 in Patent applications provide window into Google Book Search and Gmail. Those documents discuss the use of optical character recognition to both read the text within books and to understand differences in the structural elements of that text, so that, for instance, chapter headings in books or article titles in magazines might be seen and indexed differently than body text from those documents.
These text recognition and extraction techniques will work with digital still images and with video images. A number of the techniques described work best with video, where there might be multiple images of a view from slightly different angles. If the Street View filming apparatus also included a laser distance measuring device, described in the patent filings, that may also help to eliminate false positives in recognizing text.
It’s been an old sawhorse for years that Google couldn’t recognize text that was displayed in images while indexing pages on the Web. These patent filings hint that Google may be able to do much more with images than we can imagine.
Some of the things that this technology could be used for:
- Improving local search, and showing images of the actual locations of businesses
- Providing images of other nearby businesses in a local search
- Showing alternative businesses near a location that may offer similar products or services during a local search or product search
- Picturing actual landmarks along a driving route
- Allowing for a wider range of keyword searches associated with businesses, and images of those businesses
- Enabling product searches associated with specific businesses at specific locations
- Allowing museums to be searched by keyword, or to be browsed
It’s difficult to tell if and when we might see googlebot in the grocery stores, but we probably should start wondering how well Google might be able to handle text within images on the Web these days.
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.