• http://www.niallkennedy.com/ Niall Kennedy

    The content block filtering was first proposed by Priyank Garg of Yahoo! at the Web Spam Squashing Summit I organized while at Technorati and hosted by Yahoo! in February 2005. A similar presentation was delivered by Tim in March as part of the Indexing Summit talk you reference.

    The major search engines have discussed Priyank’s proposed changes in the past, and evaluated similar plans.

  • http://searchengineland.com Danny Sullivan

    Thanks, Niall — appreciate the further history! I’ve got it updated in the story now.

  • http://www.mattcutts.com/blog/ Matt Cutts

    My personal reaction: sounds cool. :)

    Danny, can you ask how Yahoo intends to treat links in the “robots-nocontent” section? I could imagine:
    - don’t follow the links or index the anchortext with the destination document
    - allow the links for discovery, but don’t apply weight to the links or apply anchortext to the destination document
    - links are treated normally in terms of indexing, anchortext, etc.
    - other behaviors?

    I’m sure Yahoo has chosen which behavior they’re going to take with links in robots-nocontent sections, so I’m curious which way they decided to go.

  • http://sebastianx.blogspot.com/2007/04/in-need-of-web-robot-directives.html Sebastian

    I wish the multi class nonsense would not exist.

    That new class value is not much better than Google’s section targeting, from a Webmaster’s perspective. To make use of it one needs to edit tons of templates, scripts, and most probably even editorial contents stored as HTML.

    It would be so much easier to accomplish the same thing with one single line in robots.txt, addressing existing classes and DOM-IDs:

    A.advertising { rel: nofollow; } /* devalue aff links */

    DIV.hMenu, TD#bNav { content:noindex; rel:nofollow; } /* hide site wide links */

    Instead of editing at many places, this would allow simple to maintain centralized crawler directives outlined here.

    Sigh.

  • http://www.seo-theory.com/ Michael Martinez

    This makes absolutely no sense.

    Here the SEO community has been telling people for years to only present the same content to spiders and people, and Yahoo! is saying, “Nah — you can exclude us from some of that content”.

    And one of your examples is for people to tell Yahoo! not to crawl navigational content (hence, it won’t find other pages).

    I have a real problem with this class/tag/thingamajigee. Explain to me how it’s going to help anyone? I’m still waiting for someone to explain how “rel=’nofollow’” is helpful to the Web community.

    Spammers are sure not going to use this stuff unless they decide it might act as a smokescreen and they stick fluff into the ignored sections to make their pages look like they are more legitimate.

    So what is the point? Is Yahoo! tacitly admitting that they do pay attention to keyword desnity?

  • http://www.slamb.org/ slamb

    Michael Martinez:

    I’m still waiting for someone to explain how “rel=’nofollow’” is helpful to the Web community.

    It improves Google’s PageRank-based search results by saying “this link is untrusted; it might be comment spam; don’t rank it”. If that benefit to the whole community isn’t good enough for you, then the benefit for you would be that if they have some sort of penalty for outbound spammy links, you won’t get hit by it.

    Explain to me how [class="robots-nocontent" is] going to help anyone?

    The Yahoo search blog article tells you: “This tag is really about our crawler focusing on the main content of your page and targeting the right pages on your site for specific search queries. Since a particular source is limited to the number of times it appears in the top ten, it’s important that the proper matching and targeting occur in order to increase both the traffic as well as the conversion on your site. It also improves the abstracts for your pages in results by omitting unrelated text from search result summaries.”

    And one of your examples is for people to tell Yahoo! not to crawl navigational content (hence, it won’t find other pages).

    No, they just say “We won’t use the terms contained in these special tagged sections as information for finding the page or for the abstract in the search results.” I assume they still use the links in their crawling.

    Here the SEO community has been telling people for years to only present the same content to spiders and people, and Yahoo! is saying, ‘Nah — you can exclude us from some of that content’.”

    They answer this, too: “Note: Using a “nocontent” tag to mark explicit sections of content is not considered “cloaking” because all of the content on the page is available to protect the relevance of the results (unlike “cloaking” where we may be served content that is different from what visitors see).”

    Spammers are sure not going to use this stuff unless they decide it might act as a smokescreen and they stick fluff into the ignored sections to make their pages look like they are more legitimate.

    I can’t think of any reason for spammers to use this. I don’t understand what you’re saying, though. Are you complaining that Yahoo! isn’t designing their new features for the convenience of spammers?

  • http://www.idealwebtools.com/blog/ AjiNIMC

    I think it is inviting more troubles. I will go through the official guidelines and will post to the blog.

    As per Googles patent they have technology to see what part of your page is template and exclusive content. Thats a better move.

    IMO this move by Yahoo is prone to spam.

  • http://www.phatz.com christian griffith

    Hmmm… my second to post to this blog. I’m becoming a Dannny-ite myself… {sigh}

    So is fair to summarize this Yahoo! ‘release’ as:

    “using the robots-nocontent class attribute is designed to weight the importance of Slurp-indexed content”?

    I’m with the folks who question this move. For something like this to have been implemented at Yahoo, a lot of time energy had to be expended. I’d rather see this time and energy expended to reduce doorway pages in the top 10.

    I dunno – I guess we’ll wait-n-see, but I’m not all stoked and makes wonder if Yahoo is just hungry for any kind of innovation-spun PR

  • f-lops-y

    ”Any words you flag should no longer be considered by the part of the Yahoo algorithm that examines the text on a page to determine rankings, Yahoo says.
    ”…

    I’m just wondering how beneficial this is going to be to the users. If the webmasters can tell the spiders not to index huge chunks of copy unrelated to the users query in order to boost their results for a small piece of related text that might be buried at the bottom of a 42 paragraph rant on anther topic entirely, how does this benefit the users experience of the engine results, and how does it ensure that the results are actually pertinent to the searcher… I’m not convinced…

  • Trogdor

    I, too, fail to see how this new policy is helpful. It appears as though Yahoo wants webmasters to help out more in identifying what parts of the page are relevant, and what parts are not.

    Confusing; I was under the impression the SE bots could do this quite well all by themselves (i.e. identifying what’s advertising, what’s navigation, what’s template, what’s body content, etc).

    Obviously, spammers aren’t going to do this to help out Yahoo, so the only people who would volunteer are other webmasters … but I fail to see the gain.

    “Many search marketers have [found] that the content of a web page … can often seemed drowned out from a text analytics perspective by all the clutter around the content [like] ads, navigation, cross promotion material and [templates].”

    Again, it seems the challenge is for the engines to recognize what is what … and they always tell us, they’re very good at it (Matt Cutts said so at PubCon ’05, that they’re good at telling what’s navigation / template, and what’s body content on a page).

    So it seems Yahoo is saying “We need you to help us, so use this here.”

  • http://www.luckylester.com Lucky Lester

    Michael Martinez

    Off of the top of my head I would have to say that the “rel=’nofollow’” is helpful to webmasters in the concept of dynamic linking. Rather than using JavaScript to prevent the distribution of PR to low level pages we can simply use the “rel=’nofollow’” tag. As Leslie Rhodes points out… why would we want to pass on PR to low level pages that don’t lend themselves to any form of monetization?

    Another area where the “rel=’nofollow’” tag is useful is on those websites were they have many snippets of articles on a page and those articles carry two links to the rest of the story. Generally speaking they will have the titles of the articles linked to the rest of the article but they will also have another link that reads “click here for the whole story” or my personal favorite “…more”. Seems to me that a “rel=’nofollow’” might be helpful in reducing the volume of links for passing on PR.

    Lastly, I like the way Matt sidled up to Danny to get him to ask for some info from a competitor. My over active imagination gets the better of me sometimes as I pictured Matt leaning into Danny with a balloon caption of “Pssst – Hey buddy wanna…”

  • http://www.luckylester.com Lucky Lester

    Correction: I meant Leslie Rohde not Leslie Rhodes… a bit of a cranial cramp there.

  • http://www.seo-theory.com/ Michael Martinez

    It seems they have updated their blog post. Now Yahoo! is saying that links will still be crawled if they are found in the robots-nocontent sections — but will the anchor text be passed?

    My concern right now is that the only people who will use this thing are unscrupulous Webmasters who don’t want to pass value through their outbond links — and there are plenty of people who resort to “rel=’nofollow’” and Javascript for just that reason when they sell links, swap links, etc.

    Yahoo!, you have not made a good case for widespread adoption of this class.

  • Dakkar

    uhm… but I’ve some pages with some contents (like clouds, most-viewed and other box like those) that contains many words that can bring to “bad” search results. I tried with Google CSE and it fails many times, giving back lots of results with many pages that are not really related to given search query (it find every page of the site).

    I think that excluding part of a page from spider indexing could improve search engine results quality.