Yahoo! recently announced their role in creating and supporting Common Tag, a new semantic tagging format. Yahoo! says that Common Tag makes “web content more discoverable” and enables the community to “create more useful applications for aggregating, searching, and browsing the web.” Their blog post mentions that they want to accelerate the structuring of the web, which aligns with their SearchMonkey launch last year, which they said was, in part, an attempt to encourage the use of structured data on the web. This brings to mind a few questions. Why did the web need a new semantic standard? The Common Tag blog explains:
“Semantic web promises havens. The promise of machines understanding the data and acting on it semi-autonomously. But how do we get there? Not many are willing to put in a lot of work into publishing data in new formats. For the benefit that might or might not come years after enormous resources will be spent? So why not start with something easy, say tagging? Marking up text with exactly defined concepts. “
OK, maybe “explains” isn’t the right word. This question really goes to the heart of what’s curious about Common Tag. Yahoo called it a “new semantic tagging format” in its blog post, but when we asked them why the web needed something new, they clarified that it’s an RDFa vocabulary, not something made from whole cloth. The Common Tag About page also implies that this is really just part of the standards that all the major search engines have joined together to support. “In addition, search engines like Yahoo and Google have begun reading RDFa—the markup standard used by the Common Tag format—to acquire richer information about sites that use it… Google’s new Rich Snippets feature uses the information to apply similar enhancements to Google search results.” In truth, none of the major search engines are using semantic markup in web search and Google is using existing standards (microformats and RDFa) to display enhanced listings. Both Google and Yahoo have told me that they could use metadata in web search in the future, if it proves to be useful and they can safeguard against spamming. So far, this hasn’t happened. Yahoo did clarify to me that Common Tag is something they’re participating in as means to cultivate the structured data community, not something they’ve come up on their own and are trying to get the community to adopt. RDFa provides a structure from which you can create vocabularies and several companies who were using RDFa were interested in creating a tagging vocabulary. Since these companies used SearchMonkey as an application for their metadata, they asked Yahoo to help create and promote this new vocabulary. So, how does it work? Common Tag is intended to be a common tagging format to standardize tagging of concepts. According to the commontag.org site, as “publishers, developers, and end users” join in support for this format, “more content related to a specific concept will be discoverable through a single tag.” Now, for instance, the concept New York City may be tagged with “nyc”, “new_york_city”, and “newyork”. You can adding this tagging markup to your pages manually, or you can use infrastructure such as that provided by founding company Zemanta. And you can eliminate the problem of multiple tags for the same concept by using data from a participating database, such as Freebase (also a founding company). You can then use this structured data in an application such as Yahoo! SearchMonkey. For instance, the Common Tag documentation uses the following example of using the Freebase database to tag a page as being about U2:
<body xmlns:ctag="http://commontag.org/ns#" rel="ctag:tagged"> <span typeof="ctag:Tag" rel="ctag:means" resource="http://rdf.freebase.com/ns/en.u2"/> </body>
You can also do more complicated tagging, such as of external resources, sections of your web pages, and concepts within your content. For instance, you can identify the paragraphs of text on the page as follows:
<p id="first">Everyone loves Buffy the Vampire Slayer. </p> <p id="second">Amber Benson was awesome in it. </p>
And then create tags for those paragraphs:
<div xmlns:ctag="http://commontag.org/ns#" about="#second" rel="ctag:tagged"> <span typeof="ctag:Tag" rel="ctag:means" resource="http://dbpedia.org/resource/Amber_Benson"/> </div>
Why is Yahoo! so hell-bent on covering the web with structure? If Yahoo! found structured data made the web easier to crawl and their search results more relevant, I could see the push. But Yahoo! doesn’t use any of the semantic formats they’re encouraging in web search. They already were encouraging hCard, hCalendar, hReview, hAtom, XFN, Dublin Core, Creative Commons, FOAF, GeoRSS, MediaRSS, RDFa, and OpenSearch. Why do they need web developers to start using yet another format when they haven’t yet figured out how to use all of those others in their core search engine? Sure, they are involved in Common Tag in order to support the structured data community they’ve been aiming to accelerate, but why is that so important to them? Since Yahoo isn’t encouraging the use of semantic markup to help them get an edge in search, it seems they must be instead looking to increase adoption of SearchMonkey and BOSS, where these formats are used.They seemingly have diverted the energy they used to spend to help improve Yahoo’s search index via tools such as Site Explorer to working to raise adoption of BOSS. The last Site Explorer update was in August 2008, and that was simply a UI change. No new features were launched. For new features, you have to go all the way back to August 2007, for dynamic URL rewriting. Even Yahoo CEO Carol Bartz isn’t talking about focusing their consumer search engine as a core offering, but rather something that’s convenient for Yahoo users who are already on the site for some other reason (emphasis mine):
Listen, Google has the search brand, there’s no doubt about it …. [Yahoo Search] is really designed for people that are on our sites and find something interesting, they want to look farther and they go to Yahoo Search.
It’s enough to make one wonder if Yahoo is quietly abandoning its consumer search engine in favor of accelerating new third-party search engines through BOSS. If you can’t beat ‘em, help their enemies attack them on all fronts, as the old saying goes. Didn’t the search engines already try using meta tags? The idea of using meta data to tag web pages in order to describe them to search engines isn’t new, of course. The meta keywords tag has been around since at least 1995. And it’s easier to adopt than Common Tag. That U2 example? The meta keywords tag would only require this:
<meta name="keywords" content="U2">
Indeed, Yahoo supported the meta keywords tag initially (and to some extent, still does), but when Google launched, they did not. It was too easy for site owners to stuff that tag with anything they wanted, rather than the true focus of the page. Search engines use smarter methods (starting with the content on the page and how external sites link to it) for determining relevance. Could Common Tag have the same downfall? After all, as the documentation explains “you can create as many Tags as necessary to describe the contents of a document.” Not only does Common Tag seem to replicate the purpose of the meta keywords tag, it seems to also replicate Delicious-style tagging and external anchor text. From the site:
“Common Tags are not only useful for identifying the concepts covered in your content, but if you reference content elsewhere on the web, Common Tags can be used to indicate the concepts covered in that external content as well. This is useful for better describing and organizing the content of external resources from within your own content. For example, you could use Common Tags to publish bookmarks to identify the concepts described by a link, or you could use them categorize image collections stored elsewhere on the web.”
A microformat already exists for a similar purpose as well. rel=”tag” is intended to tag content, such as web pages or portions of them. Anchor text is an established method for search engines to determine how others describe an external resource. As for tags, the study Can Social Bookmarking Improve Web Search, presented at the First ACM International Conference on Web Search and Data Mining (Stanford) analyzed 40 million Delicious tags and found that anchor text was a better signal for web search relevance. Part of the problem was scale of adoption. A lot of people have to adopt this new tagging method for it to be worthwhile to use across the web. And if Delicious tags don’t have the scale, how long will it take for Common Tag to? When I asked Yahoo about it this, they acknowledge that it may not be something that’s adopted web-wide. Rather, it’s a format of interest to a particular group of developers who have needs beyond that which is available through means such as the meta keywords tag and rel=”tag”. Why would anyone implement this? It seems like a lot of work. You can tag content now using methods like anchor text and well, tags, such as those available through most blogging platforms and bookmarking sites like Delicious. If content management systems and other content creation platforms such as blogging systems incorporate this structure (for instance, by automatically using the tags labeling a blog post), we might see some adoption, but this wouldn’t eliminate the issue of multiple tags for one concept. (Zemanta, one of the founding companies for Common Tag provides plugins for blogging platforms to insert Common Tagging.) And WordPress strips out RDFa by default. The answer is that web developers will use this structure, just as they’ll use any other structure, if it’s valuable for what they’re building. And what applications are ultimately made possible by this format remain to be seen.
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.