Millions of Videos, and Now a Way to Search Inside Them from the New York Times is a big giant love story to video search firm Blinkx, suggesting that the idea of finding video content will take a leap through new idea of speech recognition. In reality, it’s not a new idea. It’s been in practice for years. And despite those years, it has failed to transform how we search for video on the web. That’s because speech recognition video search is overrated, especially given the true challenge video search faces — just getting the content centralized in the first place.
Search engines cannot really "see" video to understand what it is about any more than they can see images. Instead, to really understand what images or videos are about, they tend to look at metadata — text about the video that is either embedded within the video file or surrounding where the video is placed on a page.
Metadata is a pretty poor way to describe a video. Metadata information tends to be a short summary of what the video is about, who authored it, when it was shot and so on. Anyone who has filled out a basic video submission form when uploading to a service like YouTube is creating this type of summary metadata.
Metadata might only be a few hundred words long, at most. In contrast, the video itself might contain thousands of spoken words. So why not make video search better by capturing those words? That’s the Blinkx pitch — and has been the Blinkx pitch for several years. Despite this, Blinkx has failed to significantly grow in usage. In my view, this is because video search has been less about finding what’s spoken in videos and more about finding what’s hot, something that YouTube and other sharing and rating services excel at.
I’ll do some history to explain this more. But let me give a top level rundown on the main "technology" points to understand about how consumer-facing video search has developed. Services have been based on one or more of the following:
- Finding video by reading closed-caption or transcript information
- Finding video by crawling the web
- Finding video through sharing and rating
- Finding video by working with content partners
FYI, for a deeper drill down into some real underlying technology of video search, be sure to check out Niall Kennedy’s The current state of video search from last October.
1998-1999: Closed-Caption & Transcript Searching
Let’s start with the first technology item, that of using closed-caption or transcript information. It’s not at all new. Back in 1998, I wrote about how AltaVista had partnered with Virage to make it possible to search through President Clinton’s video testimony over the Monica Lewinsky affair. Enter a few words, and you were magically taken to the right place in the video. This magic was done through using the closed-caption information:
The service was made possible through software from a company called Virage, and how it was produced in this case was pretty straightforward. The company captured the video testimony and its closed-captions when it was broadcast on CSPAN, the cable network that covers US politics.
The closed-caption text was converted into HTML files, which in turn were associated with 158 video clips. AltaVista then used its search engine technology to index the HTML files, allowing users to find specific text and then view the associated video clip.
"What you saw was very easy for us, because it was closed captioned video," said Dave Girouard, director of product marketing at Virage [FYI, Girouard now works for Google].
Closed caption makes things easy because someone has already transcribed the video tape. Things aren’t so easy when closed-captioning isn’t available, but Virage has ways around this. It can turn to TelePrompTer text or scripts. It has also announced a partnership with IBM for speech recognition, for times when no written record is readily available.
Note the mention of speech recognition. That’s important because not all videos have close caption or transcribed information. That’s why Blinkx says it’s compelling. It will do speech recognition of all videos.
Well, that’s still not new. Speech recognition to create searchable text got a boost when Compaq (later HP) launched the now closed Speechbot service in 1999. It allowed you to do keyword searching against thousands of hours of both audio and video content.
Despite these early technology demonstrations, neither service grew in popularity because of speech-to-text recognition. Indeed, Virage (which powered the AltaVista service) ended up getting purchased by enterprise search company Autonomy in late 2005 (Autonomy, in turn, provides Blinkx with its technology). While I believe Virage continues to be recognized as a leader in the space, in terms of video search by text/speech recognition, it doesn’t provide major consumer facing products. That seems to be because the demand isn’t there.
2003: Crawling For Video Content
Video search largely languished after the initial AltaVista experiments. There was some content, but it was really Singingfish that made a splash when it came along in 2003 and started providing backend video search for Microsoft’s Windows Media Player and for the RealOne Player. By the end of the year, it was also acquired by AOL. That transformed AOL into being the only major search service offering video search at the time (AltaVista, then owned by Yahoo, was no longer a major service).
Singingfish’s main technological claim to fame at the time was crawling to build a large index of content. Both AltaVista and FAST/AllTheWeb also were crawling at that time, but they had much smaller databases.
Singingfish didn’t use closed-caption or transcribed information. Instead, it depended on more restricted metadata. But even though it was restricted data, that plus the larger database of videos made it better than its competitors, when I looked at it back then.
2004-2005: Return Of Crawling, Transcribed Information, Speech Recognition
At the end of 2004, Blinkx launched a standalone site offering video search based on speech recognition. Two years later, it’s that "new" thing that the New York Times reports that we need. So why didn’t it take off in the past two years? As I’ll explain, we needed other things more.
In 2005, Google kicked off Google Video. The New York Times article today about Blinkx suggests that Google Video can’t do transcription search:
But search engines — like Google — that were developed during the first, text-based era of the Web do a poor job of searching through this rising sea of video. That’s because they don’t search the videos themselves, but rather things associated with them, including the text of a Web page, the “metadata” that computers use to display or understand pages (like keywords or the semantic tags that describe different content), video-file suffixes (like .mpeg or .avi), or captions or subtitles.
In reality, this is exactly how Google Video started off. It actually had no video. Instead, it taped TV shows off the air, then allowed you to search for and find still photos from certain segments based on the words found in those episodes. It worked pretty well, depending on the closed-caption information.
Despite this, the service failed to gain real popularity. Google eventually dropped the TV grabs (in part, it appears, to help appease big content owners that never felt these were legal) and started showing video submissions. It chose not to crawl the web — the basis of practically everything Google had done in terms of search before this – because it felt submissions would give it better content. But despite submissions, it still didn’t become a video search powerhouse.
Toward the end of 2005, we also got a taste of hype when Truveo (later renamed SearchVideo) came onto the scene. Truveo was to go beyond looking at words on a page surrounding a video to also look at the visual characteristics of the page. I’d tell you more, but that particular angle seems to have disappeared over the years (it used to be here).
Truveo was acquired by AOL in early 2006 and also continues to power Microsoft’s Windows Live Search Video service (compare Microsoft to SearchVideo for the same query, and you can see this. Also, Microsoft has a "reserve" agreement to tap into Blinkx). But didn’t AOL also buy Singingfish? Yes, and they closed that service earlier this month, redirecting to Truveo-powered AOL Video. Despite Truveo’s supposed killer technology, the last stats I saw on video search popularity didn’t have it leaping ahead of others. Those are older, from May 2006, so things may have changed.
2005: Killer Tech, Sharing
After years of video search efforts, both the major players of Google and Yahoo showed that it wasn’t crawling or transcript searching that would be the killer product. Nor did Blinkx prove speech recognition was some killer technology. Truveo also failed to show there was some technology twist to "understanding" video content that would prove a boon.
Instead, YouTube is the player that emerged as the video powerhouse. Kicked off in February 2005, it climbed and climbed in popularity until acquired by Google at the end of last year. What was different? My view of the magic components:
- Easy for anyone to watch video without a demand for some damned video plugin or add-on.
- Easy for anyone to upload popular commercial content
- Easy for anyone to share non-commercial content and make that popular
Until YouTube, I’d say much of the video search assumption had been that people would want things like news or documentaries — and in particular — the ability to search through news content to watch particular things that were said. Instead, it seems the real demand was for video search to be a way to get video on demand and especially to find popular entertainment.
The Lazy Sunday skit (try here to see it) from Saturday Night Live is a classic case in point. Aired at the end of 2005, many flocked to YouTube to see it after it missing it during the live airing.
YouTube, of course, has been notorious for having so much content that might be reposted illegally on the service. But it’s that same content that made it a compelling reason for many to seek it out. In addition to the commercial content, the sharing and voting system there has also allowed non-commercial content to go viral and become popular in a way that previous video search systems didn’t allow.
In particular, it’s not just voting that helped the non-commercial content. It’s the hosting. It’s expensive and a hassle to host video content for small site owners. YouTube made it possible for anyone to effectively become a broadcaster efficiently. Sure, a crawler-based video search service could (and they did) find non-commercial content hosted elsewhere on the web. However, allowing for content uploading fueled a content explosion.
2006/2007: Forget Speech Recognition, Bring On Meta Search
YouTube’s happy days of having commercial content for free are rapidly drawing to a close, of course. This month, we’ve seen Viacom yank its clips off the service to move the content instead to Joost. That goes directly to one of the three magic components of YouTube — the loss of popular commercial content. Now go back to the New York Times article about Blinkx, and its lead about why Blinkx technology is compelling:
The World Wide Web is awash in digital video, but too often we can’t find the videos we want or browse for what we might like.
No we can’t. But that has little to do with not being able to search against spoken text in those videos. The real challenge is that much of the video we want isn’t online or if it is, it is rapidly being removed as content owners put further pressure on sharing sites like YouTube.
The result may be a fragmented video spectrum, where you may have to tune into iTunes for content from one company, YouTube for another, Joost for a third and so on. The solution is actually something else that Blinkx is strong for – not speech recognition but instead meta search. Meta search is the ability to search against a variety of sites and bring back consolidated results from all of them.
To illustrate this, again, a return to the New York Times article. It noted that the aforementioned Lazy Sunday video could be found on Blinkx:
To experiment, I typed in the phrase “Chronic — WHAT — cles of Narnia,” the shout-out in the “Saturday Night Live” digital short called “Lazy Sunday,” a rap parody of two New York slackers. I wanted a phrase that a Web surfer would know more readily than the real title of a video. I also knew that “Lazy Sunday,” for all its cultish fame, would be hard to find: NBC Universal had freely released the rap parody on the Internet after broadcasting it in December 2005, but last month the company insisted that YouTube pull it.
Nonetheless, Blinkx found eight instances of “Lazy Sunday” when I tried it last week. By contrast, Google Video found none. Typing “Lazy Sunday” into the keyword search box on Google’s home page produced hundreds of results — but many were commentaries about the video, and many had nothing to do with “Saturday Night Live.”
Heh. This was overkill in experimenting and the result had nothing to do with speech recognition. First, I think most people seeking the Lazy Sunday video actually would know it more by that name, "lazy sunday." Second, Google Video found no matches because Google Video only searches what’s in Google Video. If the clip has been yanked — as the article itself says it was — there’s nothing to find speech recognition or not.
It’s like saying you couldn’t find a tree in the forest because you lacked tree recognition technology. Sure, tree recognition technology might help. But if someone’s chopped down the tree and dragged it out of the forest, all that technology isn’t going to make it appear.
The reason Blinkx finds the clip is because Blinkx is meta searching. Rather than hosting the content itself, it builds a database of content that is hosted by others from across the web. That helps protect it to some degree from takedown notices.
In particular, here’s the search I did for Lazy Sunday. The top item led me to the video here as hosted by Brightcove, which allows video sharing like YouTube. I can’t tell if Brightcove has a deal to be showing this clip or not (one past New York Times article suggests yes).
In either case, a meta search service like Blinkx wins (and similar services like Search For Video). If YouTube has to drop the video, then the video is gone since YouTube only searches what’s in its own database. But Blinkx, tapping into multiple databases, may continue to list the video.
If Blinkx does get hit with takedown notice, it can remove that link and wait until someone else puts it up elsewhere. Alternatively — and this is most important — by tapping into many video search providers, it defragments the video search market. It becomes a place where anyone can turn to in order to search across multiple video content providers. Of course, the real weakness it faces is if any of these hosting providers decide to block access for Blinkx to index their services.
I’ve covered a variety of search providers in this article. If you’re looking for more, you might check out Video Search Made the Internet Star from last November at Search Engine Watch and Video Search Engines And Online Video Directories: A Mini-Guide from Robin Good. I also guaranteed that Gary Price of ResourceShelf will be along in the comments area below to add his thoughts and resources. Feel free to contribute your own, as well.
Overall, I’m sure speech recognition will eventually find its place in video search. Plus let’s skip the hype. It hasn’t been a crucial piece of technology for years, and it’s hardly the next challenge video search faces. The real challenge is figuring out how to work with content owners.