• http://www.searchinfluence.com/blog/ Will Scott

    It’s an exciting development Greg, that’s for sure.

    There’s great opportunities across the board to leverage advanced image and video tagging, to push Barnacle SEO to a new level and to leverage the heck out of these things in general.

    There’s some testing ongoing which will hopefully bear fruit before next week so we can report on it.

    Good timing on Google’s part for the release, no?

    Will

  • http://www.planetc1.com/ chiropractic

    My gut says this is a prelude to landing pages for local merchants and small businesses. Seems like a natural progression based on what’s taken place so far. I think businesses can learn from studying the elements appearing on those pages and apply them to their own sites/pages.

  • http://silvery.com Chris Silver Smith

    Danny, I think Google should be very unhappy with the inefficiency that’s exposed if they only have the noindex meta tag to use for keeping those pages out of the index.

    In order for that to work, they’ll potentially end up crawling millions of their own pages – only to *not* index them! Not a very green solution, I must say.

    So, it’s a bit of a catch-22. Robots.txt won’t keep the pages from being indexed as they represented they would insure, but if they have to resort to noindex meta tags, they’ll risk ultimately impacting their own performance as they try to crawl all the pages that could get linked-to, and they’ll expend tons of needless CPUs in the process. Not elegant.

    Hmmm… sounds like we really need a solution for telling them which pages to NOT index as well as NOT crawl.

  • http://stroseo.com stroseo

    Perfect explanation: “Blocking with robots.txt doesn’t allow Google to spider a page, but it may produce what it calls a “partially indexed” listing”

    Everyone’s been going back and forth about why a few of these pages are appearing in the index. Right now, only a few of these pages out of a potential hundreds of millions or more are showing up in search results so, right now, I’m not too concerned about the traditional natural search results.

    However, what are Google’s long term goals? Matt Cutts (bit.ly/1HKDqW) and Lior Ron (bit.ly/dXd3c Google’s Global UGC Lead) have both made appearances in the comment sections of a couple of article posts but both have been translucent about Google’s long term intentions. I talk about more of my concerns here: http://bit.ly/dXd3c

    But the bottom line here is that many of the companies that Google is aggregating data from heavily rely on the traffic received from Google search visitors. If the visitor flow is cut off to these content originators, their business models may fail which will cease the data feeds that are currently populating Google’s Place Pages. Very parasitic in nature, and very unlike Google.

  • http://www.googleandblog.com/ Michael Martin

    Seems like we found the hot topic for SMX East as NoFollow & Paid Links dominated SMX Advanced.

    I am sure Michael Gray will be all over this ;)

    See you all at SMX East in NYC next week!

    ,Michael Martin

  • http://stroseo.com stroseo

    Ok so a more specific answer to the “Partially indexed URLs” phenomenon.

    Over at Tech Crunch, Matt Cutts followed up with another reply regarding the URLs showing up in search results even though robots.txt files prevented the crawling of the pages.

    He referred to a previous post of his regarding this issue here: http://www.mattcutts.com/blog/googlebot-keep-out/

    Matt said:

    “You might wonder why Google will sometimes return an uncrawled url reference, even if Googlebot was forbidden from crawling that url by a robots.txt file.”

    “There’s a pretty good reason for that: back when I started at Google in 2000, several useful websites (eBay, the New York Times, the California DMV) had robots.txt files that forbade any page fetches whatsoever. Now I ask you, what are we supposed to return as a search result when someone does the query [california dmv]? We’d look pretty sad if we didn’t return http://www.dmv.ca.gov as the first result. But remember: we weren’t allowed to fetch pages from http://www.dmv.ca.gov at that point. The solution was to show the uncrawled link when we had a high level of confidence that it was the correct link. Sometimes we could even pull a description from the Open Directory Project, so that we could give a lot of info to users even without fetching the page.”

  • http://searchengineland.com/ Danny Sullivan

    Yes, I know the arguments that Google makes for showing partially indexed or “link only” URLs. They’re good ones. And yet, they still don’t respect a site owner’s wish to perhaps say no, I really don’t want you to list me.

    Improving the robots.txt protocol might be a way forward, allowing people to say don’t crawl and do/don’t show a link.

  • http://stroseo.com stroseo

    Hey Danny, I totally agree with you, technically right now it seems there is no 100% sure way to prevent a URL from showing up in the index. This only strengthens your comment: “Yes, even Google can be stupid when it comes to SEO”