Yahoo Supports New Robots-Nocontent Tag To Block Indexing Within A Page

For over a decade, search engines have supported standards allowing you to prevent pages from being spidered or included within a search index. Today, Yahoo now supports a new twist — a way to flag that part of your page shouldn’t be included in an index. It’s called the robots-nocontent tag. Many search marketers have […]

Chat with SearchBot


For over a decade, search engines have supported
standards allowing
you to prevent pages from being spidered or included within a search index.
Today, Yahoo now supports a new twist — a way to flag that part of your page
shouldn’t be included in an index. It’s called the robots-nocontent tag.

Many search marketers have long struggled with the problem that the “core”
content of a web page — the main body copy or article — can often seemed
drowned out from a text analytics perspective by all the clutter around the
content. That clutter is often ads, navigational links, cross promotion material
and other stuff used in page templates.

The new robots-nocontent tag now allows you to tell Yahoo to ignore the
clutter. Simply use the tag (technically, it’s an attribute) to surround text
you do NOT want included in searchable content within Yahoo.

How? It’s a little complicated, but not too hard. You need to have a
class attribute
called robots-nocontent assigned to some tag within your document. The attribute
looks like this:

class=”robots-nocontent”

Now let’s say you have a paragraph of text you do NOT want included. You
could use the <p> paragraph tags with this class attribute to flag the content
as not to be indexed. Here’s the before:

<p>
Blah blah here’s my text it is so bad blah blah blah.
</p>

And here’s the after, where I’ve bolded how robots-nocontent would be added:

<p class=”robots-nocontent”>
Blah blah here’s my text it is so bad blah blah blah.
</p>

Let’s say you have a block of text you wanted to flag. You could do this
using container
tags
like <SPAN> or <DIV>. For example, here’s another before and after:

<p>
I remembered a bad poem I wanted to write
</p>
<p>
But then I forgot it in the night
</p>
<p>
Sadly I remembered in the day
</p>
<p>
I wrote it; it got indexed, and now it won’t go away
</p>

That’s several paragraphs of text, and flagging each paragraph to nocontent
would be a pain. Instead, you could enclose all of them with a special <DIV>
tag, as bolded below:

<div class=”robots-nocontent”>
<p>
I remembered a bad poem I wanted to write</p>
<p>
But then I forgot it in the night</p>
<p>
Sadly I remembered in the day
</p>
<p>
I wrote it; it got indexed, and now it won’t go away
</p>
</div>

Yahoo’s new tag was inspired from a microformat draft for robots exclusion
you’ll find here.
However, that draft is NOT the standard Yahoo is using. Let me say that again,
more loudly.

THE MICROFORMAT DRAFT IS NOT WHAT YAHOO IS USING!

The new robots-nocontent standard is solely Yahoo’s own creation, and they
define how it will be used for Yahoo alone. See Yahoo’s official guidelines here. Other search engines might decide to
support it, similar to how Google, Yahoo and Microsoft
all support the

nofollow attribute
for links.

Because this was just announced, I haven’t yet surveyed the other search
engines to see if they’ll join in. I can tell you that none of them will have an
answer today. They’re going to need to consider making such a change and examine
how they might implement it.

Some more things about no-content, from talking with Yahoo about it:

  • Is it cloaking? No, because Yahoo will still see your entire page. It
    simply won’t make searchable the content you flag.
  • Does it prevent spidering? No — indeed, to see the tag, it has to spider
    your page.
  • Does it prevent indexing? No, in the sense that Yahoo is still going to
    store all the words of your page within the index is creates of web documents.
    The flagged text simply won’t be SEARCHABLE. Any words you flag should no
    longer be considered by the part of the Yahoo algorithm that examines the text
    on a page to determine rankings, Yahoo says.
  • Remember, you still might rank for words even if they are blocked. That’s
    because link analysis can make pages relevant for words they don’t actually
    use (see Google
    Declares Stephen Colbert As Greatest Living American
    for more about this)

You can use the attribute alongside other attributes, as well. Yahoo says you
should simply add them within the quoted class area, with a space between
attributes.

For example, say you have a class for a DIV tag already called “navigation”
that you use to style navigational links. It might look like this:

<div class=”navigation”>

To add no-content, just insert that attribute anywhere within the quoted
section, the part after class=, like this:

<div class=”navigation robots-nocontent”>

Before, after or between other attributes, it makes no difference, Yahoo
says.

Now some history. Yahoo first proposed this type of attribute way back in
February 2005, at the

Web Spam Squashing Summit
that Niall Kennedy organized. A month later, Yahoo
presented it at an
indexing summit
that I organized. They revisited it again, sharing it with
the audience of the
robots.txt summit
I organized last month. The response was positive enough
from both summits that they’ve decided to try it out.

That’s what I love about these types of summits — they really can produce
changes with the search engines (and I’ve got two more coming up,

Duplicate Content Summit
and

Penalty Box Summit
at our
SMX Advanced show
in Seattle next month).

Finally — what will this mean for search marketers and searchers in general.
Who knows. When nofollow came out, we saw it start to have a dramatic impact in
usage, especially in people being uncertain what they should and shouldn’t flag
as nofollow (Time For
Google To Give Up The Fight Against Paid Links?
from me last week covers
this a bit more).

The robots-nocontent attribute should be less controversial. That’s because
it’s not something someone can use against you. Instead, as with ways of
blocking
content
, this is a means for site owner to exercise more control over how
they are listed.


Feed Icon

More About Our Feeds


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Danny Sullivan
Contributor
Danny Sullivan was a journalist and analyst who covered the digital and search marketing space from 1996 through 2017. He was also a cofounder of Third Door Media, which publishes Search Engine Land and MarTech, and produces the SMX: Search Marketing Expo and MarTech events. He retired from journalism and Third Door Media in June 2017. You can learn more about him on his personal site & blog He can also be found on Facebook and Twitter.

Get the must-read newsletter for search marketers.