Tagging 10,000 Videos: Science Or Sweat?
In the last SEL video search column, Eric Papczun described the dilemma faced by content producers that are striving to keep up with the insatiable demand for new video content. Producers are looking for efficient methods of discovering what online viewers are interested in, and then packaging their content to maximize exposure. The example used […]
In the last SEL video search column, Eric Papczun described the dilemma faced by content producers that are striving to keep up with the insatiable demand for new video content. Producers are looking for efficient methods of discovering what online viewers are interested in, and then packaging their content to maximize exposure.
The example used was a television network, one that is mostly concerned with ongoing generation of new content. What’s interesting to note is that, of all the content producers vying for attention on YouTube, television networks are the ones with the most preexisting content to offer.
Dusting off the archives
Think of the volumes of material at networks’ disposal. A half-hour nightly news show that’s been on the air for 30 years has (minus commercials) about 3,500 hours of content to offer. Factor in the number of news stories that represents and you’re talking ten times as many clips, easily. And that’s just one show.
The question for owners of previously televised content is: how do we pick and choose archived content that’s relevant for search, and how do we process that content without breaking the bank? It’s a daunting issue, especially when you realize that the majority of these videos will be long tail at best.
The challenges are to a) find a cost-efficient way to plow through the backfill, and b) identify the attributes that make each clip interesting and relevant to a modern-day audience. These attributes will become the metadata we need to make sure this clip sees the light of day again.
Let’s try science
Talk to anyone with a technical background, and the first suggestion you’ll get is to try some form of automation. Scan the videos, find some way to tease out the choice nuggets of information, and pour all into some format that the engines can digest.
Easier said than done, but a few clever firms are taking a crack at it. Blinkx scans both the soundtrack and on-screen text and uses them as indexable content, the way a conventional engine uses on-page text. Google Video takes a shortcut, using the closed caption transcripts from broadcast TV clips as the source for its searchable text.
This works well for non-fiction videos, where useful keywords come up in the actual dialog. But fiction is a different story: a 5-minute clip from The Breakfast Club will never mention John Cusack, Molly Ringwald, or Peter Gabriel.
Don’t needlessly date yourself
There are other subtleties to video tagging that are very difficult for a machine to anticipate. What’s hot today might not be tomorrow. PBS would have a problem processing all those old documentaries talking about the “greenhouse effect,” because while there’s huge demand for the topic, it’s now referred to as “global warming.” As in conventional search, keyword usage changes over time, and videos with a long shelf life may need to be earmarked for a more current topic.
In fact, this is where we connect back to new content producers and their needs. In this industry, we’ve all experienced the turbulence that comes when search engines change the rules. What won the game yesterday doesn’t even compete today. So it will be with video search engines, and the time may come when you uncover the New Best Practices and have to re-tag your content to accommodate them.
Sweating to the oldies (sorry)
Automation promises efficiency and scalability, but it doesn’t give us the magic we need. If we want our video tagging to reflect the most relevant choices available to us, we need to add some sweat equity into the process. To make judgment calls, we need human reviewers.
This is not an endeavor to be taken lightly. If 10,000 videos each need 10 minutes of someone’s time to be viewed, lightly researched, and tagged, that’s enough hours to employ someone full-time for almost a year. One salary, one benefits package, one cubicle… it all sounds like an expensive project, one that could easily be dismissed as no viable. But that’s an old way of looking at labor.
Share the workload
There are plenty of new methods for distributing tasks that are both more efficient and also better at finding the lowest-cost, highest-quality workers. Portals like Elance, Odesk and Mechanical Turk (which requires some explaining) are great resources – not just for obtaining cheap labor, but also managing projects and building sustainable teams. Outsourcing may be a dirty word in some circles, but this is a genuine case of a job that simply can’t get done through other means.
Not that outsourcing always implies offshoring. Jason Calacanis of Mahalo fame (infamy?) recently published a series of tips for start-ups, and the one that really struck me was:
Outsource to middle America: There are tons of brilliant people living between San Francisco, Los Angeles, and New York who don’t live in a $4,000 one bedroom apartment and pay $8 to dry clean a shirt – hire them!
Take a layered approach
The most sensible plan for a large-scale video tagging project is to take a layered approach. Use automation to make that first pass, extracting whatever information is available. Use that info to prioritize the videos that have the most promise, and then feed them to your staff of reviewers for that vital human input. With the right methods and processes in place, you can revive a dusty library of shows and turn them into the historical/retro/vintage clips that are sometimes laughable, sometimes insightful, but always click-worthy.
Sherwood Stranieri is Director of Natural Search at SMG Search, a dedicated search unit of Starcom MediaVest Group. Based in Chicago, SMG Search creates integrated search strategies for some of the world’s largest companies. The Video Search column appears on Thursdays at Search Engine Land.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.