For over a decade, search engines have supported standards allowing you to prevent pages from being spidered or included within a search index. Today, Yahoo now supports a new twist — a way to flag that part of your page shouldn’t be included in an index. It’s called the robots-nocontent tag.
Many search marketers have long struggled with the problem that the “core” content of a web page — the main body copy or article — can often seemed drowned out from a text analytics perspective by all the clutter around the content. That clutter is often ads, navigational links, cross promotion material and other stuff used in page templates.
The new robots-nocontent tag now allows you to tell Yahoo to ignore the clutter. Simply use the tag (technically, it’s an attribute) to surround text you do NOT want included in searchable content within Yahoo.
How? It’s a little complicated, but not too hard. You need to have a class attribute called robots-nocontent assigned to some tag within your document. The attribute looks like this:
Now let’s say you have a paragraph of text you do NOT want included. You could use the <p> paragraph tags with this class attribute to flag the content as not to be indexed. Here’s the before:
<p> Blah blah here’s my text it is so bad blah blah blah. </p>
And here’s the after, where I’ve bolded how robots-nocontent would be added:
<p class=”robots-nocontent”> Blah blah here’s my text it is so bad blah blah blah. </p>
Let’s say you have a block of text you wanted to flag. You could do this using container tags like <SPAN> or <DIV>. For example, here’s another before and after:
<p> I remembered a bad poem I wanted to write </p> <p> But then I forgot it in the night </p> <p> Sadly I remembered in the day </p> <p> I wrote it; it got indexed, and now it won’t go away </p>
That’s several paragraphs of text, and flagging each paragraph to nocontent would be a pain. Instead, you could enclose all of them with a special <DIV> tag, as bolded below:
<div class=”robots-nocontent”> <p> I remembered a bad poem I wanted to write</p> <p> But then I forgot it in the night</p> <p> Sadly I remembered in the day </p> <p> I wrote it; it got indexed, and now it won’t go away </p> </div>
Yahoo’s new tag was inspired from a microformat draft for robots exclusion you’ll find here. However, that draft is NOT the standard Yahoo is using. Let me say that again, more loudly.
THE MICROFORMAT DRAFT IS NOT WHAT YAHOO IS USING!
The new robots-nocontent standard is solely Yahoo’s own creation, and they define how it will be used for Yahoo alone. See Yahoo’s official guidelines here. Other search engines might decide to support it, similar to how Google, Yahoo and Microsoft all support the nofollow attribute for links.
Because this was just announced, I haven’t yet surveyed the other search engines to see if they’ll join in. I can tell you that none of them will have an answer today. They’re going to need to consider making such a change and examine how they might implement it.
Some more things about no-content, from talking with Yahoo about it:
- Is it cloaking? No, because Yahoo will still see your entire page. It simply won’t make searchable the content you flag.
- Does it prevent spidering? No — indeed, to see the tag, it has to spider your page.
- Does it prevent indexing? No, in the sense that Yahoo is still going to store all the words of your page within the index is creates of web documents. The flagged text simply won’t be SEARCHABLE. Any words you flag should no longer be considered by the part of the Yahoo algorithm that examines the text on a page to determine rankings, Yahoo says.
- Remember, you still might rank for words even if they are blocked. That’s because link analysis can make pages relevant for words they don’t actually use (see Google Declares Stephen Colbert As Greatest Living American for more about this)
You can use the attribute alongside other attributes, as well. Yahoo says you should simply add them within the quoted class area, with a space between attributes.
For example, say you have a class for a DIV tag already called “navigation” that you use to style navigational links. It might look like this:
To add no-content, just insert that attribute anywhere within the quoted section, the part after class=, like this:
<div class=”navigation robots-nocontent”>
Before, after or between other attributes, it makes no difference, Yahoo says.
Now some history. Yahoo first proposed this type of attribute way back in February 2005, at the Web Spam Squashing Summit that Niall Kennedy organized. A month later, Yahoo presented it at an indexing summit that I organized. They revisited it again, sharing it with the audience of the robots.txt summit I organized last month. The response was positive enough from both summits that they’ve decided to try it out.
That’s what I love about these types of summits — they really can produce changes with the search engines (and I’ve got two more coming up, Duplicate Content Summit and Penalty Box Summit at our SMX Advanced show in Seattle next month).
Finally — what will this mean for search marketers and searchers in general. Who knows. When nofollow came out, we saw it start to have a dramatic impact in usage, especially in people being uncertain what they should and shouldn’t flag as nofollow (Time For Google To Give Up The Fight Against Paid Links? from me last week covers this a bit more).
The robots-nocontent attribute should be less controversial. That’s because it’s not something someone can use against you. Instead, as with ways of blocking content, this is a means for site owner to exercise more control over how they are listed.