Sign up to receive weekly insights on video advertising and trends.
Video Search Challenge Isn’t Speech Recognition, It’s Content Owner Management
Millions of Videos, and Now a Way to Search Inside Them from the New York
Times is a big giant love story to video search firm
Blinkx, suggesting that the idea of finding
video content will take a leap through new idea of speech recognition. In
reality, it’s not a new idea. It’s been in practice for years. And despite those
years, it has failed to transform how we search for video on the web. That’s
because speech recognition video search is overrated, especially given the true
challenge video search faces — just getting the content centralized in the
Search engines cannot really "see" video to understand what it is about any
more than they can see images. Instead, to really understand what images or
videos are about, they tend to look at metadata — text about the video that is
either embedded within the video file or surrounding where the video is placed
on a page.
Metadata is a pretty poor way to describe a video. Metadata information tends
to be a short summary of what the video is about, who authored it, when it was
shot and so on. Anyone who has filled out a basic video submission form when
uploading to a service like YouTube is creating this type of summary metadata.
Metadata might only be a few hundred words long, at most. In contrast, the
video itself might contain thousands of spoken words. So why not make video
search better by capturing those words? That’s the Blinkx pitch — and has been
the Blinkx pitch for several years. Despite this, Blinkx has failed to
significantly grow in usage. In my view, this is because video search has been
less about finding what’s spoken in videos and more about finding what’s hot,
something that YouTube and other sharing and rating services excel at.
I’ll do some history to explain this more. But let me give a top level
rundown on the main "technology" points to understand about how consumer-facing
video search has developed. Services have been based on one or more of the
- Finding video by reading closed-caption or transcript information
- Finding video by crawling the web
- Finding video through sharing and rating
- Finding video by working with content partners
FYI, for a deeper drill down into some real underlying technology of video
search, be sure to check out Niall Kennedy’s
The current state of video search from last October.
1998-1999: Closed-Caption & Transcript Searching
Let’s start with the first technology item, that of using closed-caption or
transcript information. It’s not at all new. Back in 1998, I
about how AltaVista had partnered with Virage to make it possible to search
through President Clinton’s video testimony over the Monica Lewinsky affair.
Enter a few words, and you were magically taken to the right place in the video.
This magic was done through using the closed-caption information:
The service was made possible
through software from a company called Virage, and how it was produced in this
case was pretty straightforward. The company captured the video testimony and
its closed-captions when it was broadcast on CSPAN, the cable network that
covers US politics.
The closed-caption text was
converted into HTML files, which in turn were associated with 158 video clips.
AltaVista then used its search engine technology to index the HTML files,
allowing users to find specific text and then view the associated video clip.
"What you saw was very easy for
us, because it was closed captioned video," said Dave Girouard, director of
product marketing at Virage [FYI,
works for Google].
Closed caption makes things
easy because someone has already transcribed the video tape. Things aren’t so
easy when closed-captioning isn’t available, but Virage has ways around this.
It can turn to TelePrompTer text or scripts. It has also announced a
partnership with IBM for speech recognition, for times when no written record
is readily available.
Note the mention of speech recognition. That’s important because not all
videos have close caption or transcribed information. That’s why Blinkx says
it’s compelling. It will do speech recognition of all videos.
Well, that’s still not new. Speech recognition to create searchable text got
a boost when Compaq (later HP) launched the now
Speechbot service in 1999. It allowed you to do keyword searching against
thousands of hours of both audio and video content.
Despite these early technology demonstrations, neither service grew in
popularity because of speech-to-text recognition. Indeed,
Virage (which powered the AltaVista
service) ended up getting purchased by enterprise search company Autonomy in
late 2005 (Autonomy, in turn,
Blinkx with its technology). While I believe Virage continues to be recognized
as a leader in the space, in terms of video search by text/speech recognition,
it doesn’t provide major consumer facing products. That seems to be because the
demand isn’t there.
2003: Crawling For Video Content
Video search largely languished after the initial AltaVista experiments.
There was some content, but it was really Singingfish that made a splash
came along in 2003 and started providing backend video search for Microsoft’s
Windows Media Player and for the RealOne Player. By the end of the year, it was
also acquired by AOL. That transformed AOL
the only major search service offering video search at the time (AltaVista, then
owned by Yahoo, was no longer a major service).
Singingfish’s main technological claim to fame at the time was crawling to
build a large index of content. Both AltaVista and FAST/AllTheWeb also were
crawling at that time, but they had much smaller databases.
Singingfish didn’t use closed-caption or transcribed information. Instead, it
depended on more restricted metadata. But even though it was restricted data,
that plus the larger database of videos made it better than its competitors,
when I looked at it back then.
2004-2005: Return Of Crawling, Transcribed Information, Speech Recognition
At the end of 2004, Blinkx launched a standalone site offering video search
based on speech recognition. Two years later, it’s that "new" thing that the New
York Times reports that we need. So why didn’t it take off in the past two
years? As I’ll explain, we needed other things more.
In 2005, Google kicked off Google Video.
The New York Times article today about Blinkx suggests that Google Video can’t
do transcription search:
But search engines — like Google — that were developed during the first,
text-based era of the Web do a poor job of searching through this rising sea of
video. That’s because they don’t search the videos themselves, but rather things
associated with them, including the text of a Web page, the “metadata” that
computers use to display or understand pages (like keywords or the semantic tags
that describe different content), video-file suffixes (like .mpeg or .avi), or
captions or subtitles.
In reality, this is exactly how Google Video started off. It actually had no
video. Instead, it taped TV shows off the air, then allowed you to search for
and find still photos from certain segments based on the words found in those
episodes. It worked pretty well, depending on the closed-caption information.
Despite this, the service failed to gain real popularity. Google eventually
dropped the TV grabs (in part, it appears, to help appease big content owners
that never felt these were legal) and
started showing video submissions. It chose not to crawl the web — the
basis of practically everything Google had done in terms of search before this
— because it felt submissions would give it better content. But despite
submissions, it still didn’t become a video search powerhouse.
Toward the end of 2005, we also got a taste of hype when Truveo (later
came onto the scene.
Truveo was to go beyond looking at words on a page surrounding a video to also
look at the visual characteristics of the page. I’d tell you more, but that
particular angle seems to have disappeared over the years (it used to be
acquired by AOL in early 2006 and also continues to power Microsoft’s
Windows Live Search Video service (compare
SearchVideo for the same query, and you can see this. Also, Microsoft has a
agreement to tap into Blinkx). But didn’t AOL also buy Singingfish? Yes, and
they closed that
service earlier this month, redirecting to Truveo-powered
AOL Video. Despite Truveo’s supposed killer
technology, the last
stats I saw on video search popularity didn’t have it leaping ahead of
others. Those are older, from May 2006, so things may have changed.
2005: Killer Tech, Sharing
After years of video search efforts, both the major players of Google and
Yahoo showed that it wasn’t crawling or transcript searching that would be the
killer product. Nor did Blinkx prove speech recognition was some killer
technology. Truveo also failed to show there was some technology twist to
"understanding" video content that would prove a boon.
Instead, YouTube is the player that emerged
as the video powerhouse. Kicked off in February 2005, it climbed and climbed in
popularity until acquired by Google at the end of last year. What was different?
My view of the magic components:
- Easy for anyone to watch video without a demand for some damned video
plugin or add-on.
- Easy for anyone to upload popular commercial content
- Easy for anyone to share non-commercial content and make that popular
Until YouTube, I’d say much of the video search assumption had been that
people would want things like news or documentaries — and in particular — the
ability to search through news content to watch particular things that were
said. Instead, it seems the real demand was for video search to be a way to get
video on demand and especially to find popular entertainment.
The Lazy Sunday skit (try
here to see it) from Saturday Night Live is a classic case in point. Aired
at the end of 2005, many flocked to YouTube to see it after it missing it during
the live airing.
YouTube, of course, has been notorious for having so much content that might
be reposted illegally on the service. But it’s that same content that made it a
compelling reason for many to seek it out. In addition to the commercial
content, the sharing and voting system there has also allowed non-commercial
content to go viral and become popular in a way that previous video search
systems didn’t allow.
In particular, it’s not just voting that helped the non-commercial content.
It’s the hosting. It’s expensive and a hassle to host video content for small
site owners. YouTube made it possible for anyone to effectively become a
broadcaster efficiently. Sure, a crawler-based video search service could (and
they did) find non-commercial content hosted elsewhere on the web. However,
allowing for content uploading fueled a content explosion.
2006/2007: Forget Speech Recognition, Bring On Meta Search
YouTube’s happy days of having commercial content for free are rapidly
drawing to a close, of course. This month, we’ve seen Viacom
yank its clips off
the service to move
the content instead to Joost. That goes directly
to one of the three magic components of YouTube — the loss of popular
commercial content. Now go back to the New York Times article about Blinkx, and
its lead about why Blinkx technology is compelling:
The World Wide Web is awash in digital video, but too often we can’t find the
videos we want or browse for what we might like.
No we can’t. But that has little to do with not being able to search against
spoken text in those videos. The real challenge is that much of the video we
want isn’t online or if it is, it is rapidly being removed as content owners put
further pressure on sharing sites like YouTube.
The result may be a fragmented video spectrum, where you may have to tune
into iTunes for content from one company, YouTube for another, Joost for a third
and so on. The solution is actually something else that Blinkx is strong for —
not speech recognition but instead meta search. Meta search is the ability to
search against a variety of sites and bring back consolidated results from all
To illustrate this, again, a return to the New York Times article. It noted
that the aforementioned Lazy Sunday video could be found on Blinkx:
To experiment, I typed in the phrase “Chronic — WHAT — cles of Narnia,” the
shout-out in the “Saturday Night Live” digital short called “Lazy Sunday,” a
rap parody of two New York slackers. I wanted a phrase that a Web surfer would
know more readily than the real title of a video. I also knew that “Lazy
Sunday,” for all its cultish fame, would be hard to find: NBC Universal had
freely released the rap parody on the Internet after broadcasting it in
December 2005, but last month the company insisted that YouTube pull it.
Nonetheless, Blinkx found eight instances of “Lazy Sunday” when I tried it
last week. By contrast, Google Video found none. Typing “Lazy Sunday” into the
keyword search box on Google’s home page produced hundreds of results — but
many were commentaries about the video, and many had nothing to do with
“Saturday Night Live.”
Heh. This was overkill in experimenting and the result had nothing to do with
speech recognition. First, I think most people seeking the Lazy Sunday video
actually would know it more by that name, "lazy sunday." Second, Google Video
found no matches because Google Video only searches what’s in Google Video. If
the clip has been yanked — as the article itself says it was — there’s nothing
to find speech recognition or not.
It’s like saying you couldn’t find a tree in the forest because you lacked
tree recognition technology. Sure, tree recognition technology might help. But
if someone’s chopped down the tree and dragged it out of the forest, all that
technology isn’t going to make it appear.
The reason Blinkx finds the clip is because Blinkx is meta searching. Rather
than hosting the content itself, it builds a database of content that is hosted
by others from across the web. That helps protect it to some degree from
In particular, here’s the
search I did for Lazy
Sunday. The top item led me to the video
here as hosted by Brightcove, which
allows video sharing like YouTube. I can’t tell if Brightcove has a deal to be
showing this clip or not (one
past New York Times article suggests yes).
In either case, a meta search service like Blinkx wins (and similar services
like Search For Video). If YouTube
has to drop the video, then the video is gone since YouTube only searches what’s
in its own database. But Blinkx, tapping into multiple databases, may continue
to list the video.
If Blinkx does get hit with takedown notice, it can remove that link and wait
until someone else puts it up elsewhere. Alternatively — and this is most
important — by tapping into many video search providers, it defragments the
video search market. It becomes a place where anyone can turn to in order to
search across multiple video content providers. Of course, the real weakness it
faces is if any of these hosting providers decide to block access for Blinkx to
index their services.
I’ve covered a variety of search providers in this article. If you’re looking
for more, you might check out
Made the Internet Star from last November at Search Engine Watch and
Video Search Engines And Online Video Directories: A Mini-Guide from Robin
Good. I also guaranteed that Gary Price of
ResourceShelf will be along in the comments area below to add his thoughts
and resources. Feel free to contribute your own, as well.
Overall, I’m sure speech recognition will eventually find its place in video
search. Plus let’s skip the hype. It hasn’t been a crucial piece of technology
for years, and it’s hardly the next challenge video search faces. The real
challenge is figuring out how to work with content owners.