Back to top

    Video Search Challenge Isn’t Speech Recognition, It’s Content Owner Management

    Millions of Videos, and Now a Way to Search Inside Them from the New York Times is a big giant love story to video search firm Blinkx, suggesting that the idea of finding video content will take a leap through new idea of speech recognition. In reality, it’s not a new idea. It’s been in […]


    Millions of Videos, and Now a Way to Search Inside Them
    from the New York
    Times is a big giant love story to video search firm
    Blinkx, suggesting that the idea of finding
    video content will take a leap through new idea of speech recognition. In
    reality, it’s not a new idea. It’s been in practice for years. And despite those
    years, it has failed to transform how we search for video on the web. That’s
    because speech recognition video search is overrated, especially given the true
    challenge video search faces — just getting the content centralized in the
    first place.

    Search engines cannot really "see" video to understand what it is about any
    more than they can see images. Instead, to really understand what images or
    videos are about, they tend to look at metadata — text about the video that is
    either embedded within the video file or surrounding where the video is placed
    on a page.

    Metadata is a pretty poor way to describe a video. Metadata information tends
    to be a short summary of what the video is about, who authored it, when it was
    shot and so on. Anyone who has filled out a basic video submission form when
    uploading to a service like YouTube is creating this type of summary metadata.

    Metadata might only be a few hundred words long, at most. In contrast, the
    video itself might contain thousands of spoken words. So why not make video
    search better by capturing those words? That’s the Blinkx pitch — and has been
    the Blinkx pitch for several years. Despite this, Blinkx has failed to
    significantly grow in usage. In my view, this is because video search has been
    less about finding what’s spoken in videos and more about finding what’s hot,
    something that YouTube and other sharing and rating services excel at.

    I’ll do some history to explain this more. But let me give a top level
    rundown on the main "technology" points to understand about how consumer-facing
    video search has developed. Services have been based on one or more of the
    following:

    1. Finding video by reading closed-caption or transcript information
    2. Finding video by crawling the web
    3. Finding video through sharing and rating
    4. Finding video by working with content partners

    FYI, for a deeper drill down into some real underlying technology of video
    search, be sure to check out Niall Kennedy’s

    The current state of video search
    from last October.

    1998-1999: Closed-Caption & Transcript Searching

    Let’s start with the first technology item, that of using closed-caption or
    transcript information. It’s not at all new. Back in 1998, I
    wrote
    about how AltaVista had partnered with Virage to make it possible to search
    through President Clinton’s video testimony over the Monica Lewinsky affair.
    Enter a few words, and you were magically taken to the right place in the video.
    This magic was done through using the closed-caption information:

    The service was made possible
    through software from a company called Virage, and how it was produced in this
    case was pretty straightforward. The company captured the video testimony and
    its closed-captions when it was broadcast on CSPAN, the cable network that
    covers US politics.

    The closed-caption text was
    converted into HTML files, which in turn were associated with 158 video clips.
    AltaVista then used its search engine technology to index the HTML files,
    allowing users to find specific text and then view the associated video clip.

    "What you saw was very easy for
    us, because it was closed captioned video," said Dave Girouard, director of
    product marketing at Virage [FYI,
    Girouard now
    works for Google].

    Closed caption makes things
    easy because someone has already transcribed the video tape. Things aren’t so
    easy when closed-captioning isn’t available, but Virage has ways around this.
    It can turn to TelePrompTer text or scripts. It has also announced a
    partnership with IBM for speech recognition, for times when no written record
    is readily available.

    Note the mention of speech recognition. That’s important because not all
    videos have close caption or transcribed information. That’s why Blinkx says
    it’s compelling. It will do speech recognition of all videos.

    Well, that’s still not new. Speech recognition to create searchable text got
    a boost when Compaq (later HP) launched the now
    closed
    Speechbot service in 1999. It allowed you to do keyword searching against
    thousands of hours of both audio and video content.

    Despite these early technology demonstrations, neither service grew in
    popularity because of speech-to-text recognition. Indeed,
    Virage (which powered the AltaVista
    service) ended up getting purchased by enterprise search company Autonomy in
    late 2005 (Autonomy, in turn,
    provides
    Blinkx with its technology). While I believe Virage continues to be recognized
    as a leader in the space, in terms of video search by text/speech recognition,
    it doesn’t provide major consumer facing products. That seems to be because the
    demand isn’t there.

    2003: Crawling For Video Content

    Video search largely languished after the initial AltaVista experiments.
    There was some content, but it was really Singingfish that made a splash
    when it
    came along in 2003 and started providing backend video search for Microsoft’s
    Windows Media Player and for the RealOne Player. By the end of the year, it was
    also acquired by AOL. That transformed AOL
    into being
    the only major search service offering video search at the time (AltaVista, then
    owned by Yahoo, was no longer a major service).

    Singingfish’s main technological claim to fame at the time was crawling to
    build a large index of content. Both AltaVista and FAST/AllTheWeb also were
    crawling at that time, but they had much smaller databases.

    Singingfish didn’t use closed-caption or transcribed information. Instead, it
    depended on more restricted metadata. But even though it was restricted data,
    that plus the larger database of videos made it better than its competitors,
    when I looked at it back then.

    2004-2005: Return Of Crawling, Transcribed Information, Speech Recognition

    At the end of 2004, Blinkx launched a standalone site offering video search
    based on speech recognition. Two years later, it’s that "new" thing that the New
    York Times reports that we need. So why didn’t it take off in the past two
    years? As I’ll explain, we needed other things more.

    Yahoo Video also kicked off at
    the end of 2004. Like Singingfish, it went the spidering route. That failed to
    win Yahoo any major share. By mid-2006, Yahoo

    shifted
    to allow video upload and sharing.

    In 2005, Google kicked off Google Video.
    The New York Times article today about Blinkx suggests that Google Video can’t
    do transcription search:

    But search engines — like Google — that were developed during the first,
    text-based era of the Web do a poor job of searching through this rising sea of
    video. That’s because they don’t search the videos themselves, but rather things
    associated with them, including the text of a Web page, the “metadata” that
    computers use to display or understand pages (like keywords or the semantic tags
    that describe different content), video-file suffixes (like .mpeg or .avi), or
    captions or subtitles.

    In reality, this is exactly how Google Video started off. It actually had no
    video. Instead, it taped TV shows off the air, then allowed you to search for
    and find still photos from certain segments based on the words found in those
    episodes. It worked pretty well, depending on the closed-caption information.

    Despite this, the service failed to gain real popularity. Google eventually
    dropped the TV grabs (in part, it appears, to help appease big content owners
    that never felt these were legal) and

    started
    showing video submissions. It chose not to crawl the web — the
    basis of practically everything Google had done in terms of search before this
    — because it felt submissions would give it better content. But despite
    submissions, it still didn’t become a video search powerhouse.

    Toward the end of 2005, we also got a taste of hype when Truveo (later
    renamed SearchVideo)
    came onto the scene.
    Truveo was to go beyond looking at words on a page surrounding a video to also
    look at the visual characteristics of the page. I’d tell you more, but that
    particular angle seems to have disappeared over the years (it used to be
    here).

    Truveo was
    acquired
    by AOL in early 2006 and also continues to power Microsoft’s
    Windows Live Search Video service (compare
    Microsoft
    to

    SearchVideo
    for the same query, and you can see this. Also, Microsoft has a
    "reserve"

    agreement
    to tap into Blinkx). But didn’t AOL also buy Singingfish? Yes, and
    they closed that
    service earlier this month, redirecting to Truveo-powered
    AOL Video. Despite Truveo’s supposed killer
    technology, the last

    stats
    I saw on video search popularity didn’t have it leaping ahead of
    others. Those are older, from May 2006, so things may have changed.

    2005: Killer Tech, Sharing

    After years of video search efforts, both the major players of Google and
    Yahoo showed that it wasn’t crawling or transcript searching that would be the
    killer product. Nor did Blinkx prove speech recognition was some killer
    technology. Truveo also failed to show there was some technology twist to
    "understanding" video content that would prove a boon.

    Instead, YouTube is the player that emerged
    as the video powerhouse. Kicked off in February 2005, it climbed and climbed in
    popularity until acquired by Google at the end of last year. What was different?
    My view of the magic components:

    1. Easy for anyone to watch video without a demand for some damned video
      plugin or add-on.
    2. Easy for anyone to upload popular commercial content
    3. Easy for anyone to share non-commercial content and make that popular

    Until YouTube, I’d say much of the video search assumption had been that
    people would want things like news or documentaries — and in particular — the
    ability to search through news content to watch particular things that were
    said. Instead, it seems the real demand was for video search to be a way to get
    video on demand and especially to find popular entertainment.

    The Lazy Sunday skit (try

    here
    to see it) from Saturday Night Live is a classic case in point. Aired
    at the end of 2005, many flocked to YouTube to see it after it missing it during
    the live airing.

    YouTube, of course, has been notorious for having so much content that might
    be reposted illegally on the service. But it’s that same content that made it a
    compelling reason for many to seek it out. In addition to the commercial
    content, the sharing and voting system there has also allowed non-commercial
    content to go viral and become popular in a way that previous video search
    systems didn’t allow.

    In particular, it’s not just voting that helped the non-commercial content.
    It’s the hosting. It’s expensive and a hassle to host video content for small
    site owners. YouTube made it possible for anyone to effectively become a
    broadcaster efficiently. Sure, a crawler-based video search service could (and
    they did) find non-commercial content hosted elsewhere on the web. However,
    allowing for content uploading fueled a content explosion.

    2006/2007: Forget Speech Recognition, Bring On Meta Search

    YouTube’s happy days of having commercial content for free are rapidly
    drawing to a close, of course. This month, we’ve seen Viacom
    yank its clips off
    the service to move
    the content instead to Joost. That goes directly
    to one of the three magic components of YouTube — the loss of popular
    commercial content. Now go back to the New York Times article about Blinkx, and
    its lead about why Blinkx technology is compelling:

    The World Wide Web is awash in digital video, but too often we can’t find the
    videos we want or browse for what we might like.

    No we can’t. But that has little to do with not being able to search against
    spoken text in those videos. The real challenge is that much of the video we
    want isn’t online or if it is, it is rapidly being removed as content owners put
    further pressure on sharing sites like YouTube.

    The result may be a fragmented video spectrum, where you may have to tune
    into iTunes for content from one company, YouTube for another, Joost for a third
    and so on. The solution is actually something else that Blinkx is strong for —
    not speech recognition but instead meta search. Meta search is the ability to
    search against a variety of sites and bring back consolidated results from all
    of them.

    To illustrate this, again, a return to the New York Times article. It noted
    that the aforementioned Lazy Sunday video could be found on Blinkx:

    To experiment, I typed in the phrase “Chronic — WHAT — cles of Narnia,” the
    shout-out in the “Saturday Night Live” digital short called “Lazy Sunday,” a
    rap parody of two New York slackers. I wanted a phrase that a Web surfer would
    know more readily than the real title of a video. I also knew that “Lazy
    Sunday,” for all its cultish fame, would be hard to find: NBC Universal had
    freely released the rap parody on the Internet after broadcasting it in
    December 2005, but last month the company insisted that YouTube pull it.

    Nonetheless, Blinkx found eight instances of “Lazy Sunday” when I tried it
    last week. By contrast, Google Video found none. Typing “Lazy Sunday” into the
    keyword search box on Google’s home page produced hundreds of results — but
    many were commentaries about the video, and many had nothing to do with
    “Saturday Night Live.”

    Heh. This was overkill in experimenting and the result had nothing to do with
    speech recognition. First, I think most people seeking the Lazy Sunday video
    actually would know it more by that name, "lazy sunday." Second, Google Video
    found no matches because Google Video only searches what’s in Google Video. If
    the clip has been yanked — as the article itself says it was — there’s nothing
    to find speech recognition or not.

    It’s like saying you couldn’t find a tree in the forest because you lacked
    tree recognition technology. Sure, tree recognition technology might help. But
    if someone’s chopped down the tree and dragged it out of the forest, all that
    technology isn’t going to make it appear.

    The reason Blinkx finds the clip is because Blinkx is meta searching. Rather
    than hosting the content itself, it builds a database of content that is hosted
    by others from across the web. That helps protect it to some degree from
    takedown notices.

    In particular, here’s the
    search I did for Lazy
    Sunday. The top item led me to the video

    here
    as hosted by Brightcove, which
    allows video sharing like YouTube. I can’t tell if Brightcove has a deal to be
    showing this clip or not (one

    past
    New York Times article suggests yes).

    In either case, a meta search service like Blinkx wins (and similar services
    like Search For Video). If YouTube
    has to drop the video, then the video is gone since YouTube only searches what’s
    in its own database. But Blinkx, tapping into multiple databases, may continue
    to list the video.

    If Blinkx does get hit with takedown notice, it can remove that link and wait
    until someone else puts it up elsewhere. Alternatively — and this is most
    important — by tapping into many video search providers, it defragments the
    video search market. It becomes a place where anyone can turn to in order to
    search across multiple video content providers. Of course, the real weakness it
    faces is if any of these hosting providers decide to block access for Blinkx to
    index their services.

    I’ve covered a variety of search providers in this article. If you’re looking
    for more, you might check out
    Video Search
    Made the Internet Star
    from last November at Search Engine Watch and

    Video Search Engines And Online Video Directories: A Mini-Guide
    from Robin
    Good. I also guaranteed that Gary Price of
    ResourceShelf
    will be along in the comments area below to add his thoughts
    and resources. Feel free to contribute your own, as well.

    Overall, I’m sure speech recognition will eventually find its place in video
    search. Plus let’s skip the hype. It hasn’t been a crucial piece of technology
    for years, and it’s hardly the next challenge video search faces. The real
    challenge is figuring out how to work with content owners.


    Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. Search Engine Land is owned by Semrush. Contributor was not asked to make any direct or indirect mentions of Semrush. The opinions they express are their own.


    About the Author

    Danny Sullivan
    Danny Sullivan was a journalist and analyst who covered the digital and search marketing space from 1996 through 2017. He was also a cofounder of Third Door Media, which publishes Search Engine Land and MarTech, and produces the SMX: Search Marketing Expo and MarTech events. He retired from journalism and Third Door Media in June 2017. You can learn more about him on his personal site & blog He can also be found on Facebook and Twitter.