Google Warning Against Letting Your Search Results Get Indexed

The days of doing a Google search that brings up results leading to search results from other sites are heading for a close. Matt Cutts, in his Search Results In Search Results post today, points out a change to Google’s guidelines that shows a crackdown on this type of material may begin. More about what I’m talking about below, plus the question of whether Google should do the same with paid listings.

Over time, more and more pages seem to show up in Google search results that are merely lists of search results from those sites. To illustrate this, consider a search for dvd players:

Google Results Within Results

In the screenshot above, you can see links to BizRate, Shopping.com and Amazon that I’ve outlined in red. Each of those pages has one thing in common. The “content” is simply a list of all the DVD players they currently have for sale. For example, here’s Bizrate:

BizRate Results

Shopping.com:

Shopping.com Results

and Amazon:

Amazon Results

Matt’s post suggests that Google’s going to take a harder line against picking up content like this — that material that is nothing but a bunch of search results. The new Google guidelines on inclusion of search results content say:

Use robots.txt to prevent crawling of search results pages or other auto-generated pages that don’t add much value for users coming from search engines.

Does failing to do this mean your pages will get pulled? Well, this is a “technical guidelines” of the type designed to help site owners understand how to get indexed. In contrast, there are also “quality guidelines” where Google spells out stuff that will get you removed. Both types of guidelines are listed on the overall Webmaster Guidelines page. But while this material may not have been put under the quality guidelines, the tone of Matt’s post plus his statement there:

It’s still good to clarify that Google does reserve the right to take action to reduce search results (and proxied copies of websites) in our own search results.

Pretty much tells me it’s not something Google wants people doing anymore.

Now let me start on how you might comply with the new guideline, and the difficulty with it in some cases, beginning with Google itself. At Google, when you do a search, the URL changes to something like this:

http://www.google.com/search?q=dvd%20players

See the parts in bold? All searches at Google go in the /search area. It’s not actually a physical area at Google. There is no “/search” directory on the server holding all these answers. It’s simply something Google does to make it easier to track search queries. The query terms themselves are shown after the “q=” part.

In general, these queries would never show up in any search engine results, not even Google. That’s because search engines don’t come to search boxes and start entering words randomly. They follow links. So the only way they get to a page of search results is if there’s a link that will generate them.

Oops! The example above is a link that generates search results. That’s why Google prevents the page from being indexed by doing this in its robots.txt file:

User-agent: * Disallow: /search

The part in bold says that nothing in the /search area should get spidered. The first line makes this applicable to all search engines.

Now let’s look at Search Engine Land. You can search here, too. When you do, the search looks like this:

http://searchengineland.com/fastsearch?query=google

Practically no one links to our search results. But now thanks to the new Google guidelines, out of the blue, I have to go block off the /fastsearch area or potentially be seen as spamming Google. What a pain. It’s especially a pain because as I said, practically no one links to our search results. Google’s pretty much not spidering them, but now I’ve got more work to do — albeit not a lot.

Let’s get more complicated. Remember Shopping.com? If you do a search for DVD players over there, the URL looks something like this:

http://www.shopping.com/xFS?KW=dvd+players

To stay on Google’s good side, the /xFS area needs to be blocked off. However, that area isn’t what’s showing up currently in Google. Instead, this is

http://www.shopping.com/xPP-DVD_Players

Now technically, that’s a page within Shopping.com, not a search query that’s been issued. No one did a keyword search to make the page appear. It’s not a “search results page” triggered by an actual search. Still, it’s basically the same thing. That page brings back the same information than an actual “query-created” page does. So does it have to go? And if so, it raises a bunch of questions I’ve been meaning to revisit, such as can you do a robots.txt file that matches only the first few characters of a page name (with Shopping.com, you’d need to wipe out any “xpp-” prefixed pages).

How about BizRate? Do a search there for dvd players, and you get the same URL as in Google:

http://www.bizrate.com/dvdplayers/

So unlike Shopping.com, there’s no difference between the URL driven by an actual query and the one that Google found by crawling (in other cases, such as “sony dvd players,” BizRate does use a more query-constructed URL).

That leads me to the next point. When I say “found by crawling,” I’m talking about the fact that lots of sites have wizened up over the years to embed lots of links on their sites that generate search results as I’ve illustrated above. Shopping.com, BizRate, Amazon — those are examples of companies that have ensured their search results can be crawlable. Back in 2005, Chris Pirillo demonstrated such savvy when making sure that any search on the Gada.be (now TagJag) got turned into a crawlable subdirectory. That got him banned, then restored.

I’ve long expected that these sites were playing a losing game with Google and the other major search engines, in that this type of material indeed sometimes adds little value to searchers. If Google runs its own shopping search engine, as it does – Froogle — why wouldn’t I do a Froogle search. More to the point, it makes much more sense for Google to give me shopping search results from Froogle than from a bunch of assorted other shopping search engines.

There are some arguments against this, of course. First — that Google shouldn’t favor its own shopping product over others. Cutting out search results from rivals could be seen as Google shoring up its own business prospects. Second, Froogle can suck at times (and so can the rivals, of course). Only showing Froogle results could mean searchers at Google get less results or representation.

My feeling is that at its core, Froogle is a better way to seek products than Google. It scans across a variety of merchants, so it makes sense to be for Google to be pointing there. If Best Buy has a good price, I don’t want to have to search Google to get to Shopping.com to get to Best Buy. I should be able to search at Google, have it kick in Froogle results if they’re appropriate automatically, and get me to Best Buy faster.

The complication is that in other areas, Google doesn’t have a good vertical search product. So on a travel search, does it make sense for “search query” results to go away?

Another complication is that Google is not saying that product pages themselves have to go away. Let’s go back to Shopping.com. Here’s a page about the Panasonic DMR-ES45VS DVD recorder. Does that get blocked off? Arguably, no — since it does explain what the product is, can connect you to more details and consumer reviews, along with buying guides. Then again, when you see this at the bottom of the page:

Shopping.com Working The Search Engines

That makes it hard to work up a lot of sympathy that the aim of this page is to inform people about this product and rather than just to get found on Google and other search engines. Plus, it does contain search results (and paid listings as well), making the quality in light of the guideline changes harder to assess.

With an Amazon product page, like this one, the argument in favor of it staying in Google is much stronger. Amazon isn’t a shopping search engine but more a merchant — you can buy this directly from them. And it’s a good page describing the product, with lots of consumer reviews (over 70 of them).

The new policy also impacts what are known as “scraper” sites, those sites that simply use automated tools to get back matching search results from Google or other search engines as “content fodder” where a bunch of ads (usually from Google) get shoved at the top of the page. Google made changes to its ad system last year to try and make these types of sites less lucrative for those running them. Now the new editorial guidelines give Google more teeth to yank the sites out of search results. And few people are going to be upset about that loss.

In contrast, the new policy is a can of worms that’s been opened for shopping and other sites that have learned to turn product search results into crawlable content. At the moment, I think we’re in watch-and-see mode as to how aggressively or selectively Google applies removal. If you’re concerned, start looking at your robots.txt files now. If you’re a long-term thinker, understand the writing is clearly on the wall for sites that pretty knowingly have milked their search results to pull in Google traffic. Start planning something new. I’ll also revisit this after we get some inevitable discussion and questions raised, to see about more clarity when the dust settles.

I’ll also close with this. If listing a bunch of search results is bad for the organic/editorial listings, they also should be bad for the paid listings. Everything I showed above, where a link brings you to product search results? The same thing happens in paid search results. Shouldn’t those go as well? And if not, the crackdown on editorial results will inevitably raise complaints the move is being done to shore up ads.

NOTE: If you saw a malware warning on Jan. 31, 2009, this was due to an error that briefly impacted all web sites. See Google Gets Fearful, Flags Entire Internet As Malware Briefly, for more.

Related Topics: Channel: SEO | Google: SEO | SEO: Blocking Spiders | SEO: Spamming

Sponsored


About The Author: is a Founding Editor of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also serves as Chief Content Officer for Third Door Media, which publishes Search Engine Land and produces the SMX: Search Marketing Expo conference series. He has a personal blog called Daggle (and keeps his disclosures page there). He can be found on Facebook, Google + and microblogs on Twitter as @dannysullivan.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • http://www.wolf-howl.com graywolf

    Wonders how this is going to affect directories.

  • http://searchenginetigers.com Simon Heseltine

    …and IYP sites…

  • http://www.ericward.com eric_ward

    It’s a great first step. The big vendor sites that have enjoyed the free organic listings to product results pages now can now open their wallets and move a couple inches to the right to PPC results. It’ll be tougher for the little guy as PPC gets more costly. That’s where the sleeping giant Froogle comes in handy. Product feed submission into Froogle is free. At least right now anyway :) With the huge cashpile Google has, all it will take is a month of prime time TV ads for Froogle to become known in the mainstream world. To me Froogle has been like a massive dormant volcano. You know it’s going to erupt, and if/when it does it will change the landscape forever. -ew

  • http://blog.outer-court.com Philipp Lenssen

    > Practically no one links to our search
    > results. But now thanks to the new
    > Google guidelines, out of the blue, I
    > have to go block off the /fastsearch
    > area or potentially be seen as spamming
    > Google. What a pain.

    Danny, it makes sense to do this for other reasons than just Google though. A couple of years ago I created a little CSS search engine (you could link your own stylesheet to an XHTML-conformant page, thus creating new search engine designs on the fly). I forgot to add the correct robots.txt line, and months later suddenly found that some spammer had added thousands of links with my search engines. (Those were the days before nofollow, so this spammer achieved to get backlinks from my domain this way, carefully selecting search queries which would hit back on his page.) While that problem is not applicable to all search engines, it does cover a portion of them.

  • https://www.linkworth.com/solutions/adv_signup.php?sid=sls13 Eddie Walter

    Hi Danny,

    I believe I have an answer for one of your questions…

    “such as can you do a robots.txt file that matches only the first few characters of a page name (with Shopping.com, you’d need to wipe out any “xpp-” prefixed pages).”

    A robots.txt exclusion command will exclude anything that contains the string you are excluding. For example,

    User-agent: *
    Disallow: xxp-

    would exclude anything that includes “xxp-” in the URL. So if you had a page that was titled “www.example.com/whatever-xxp-all.html”, it would exclude that. If you wanted to just exclude any pages that started with xxp- only, but not that had xxp- anywhere in the URL, you would want to use:

    User-agent: *
    Disallow: /xxp-

    Beyond that, I find that this move by Google is a bit of a concern. As a shopper, I don’t mind having results from folks like BizRate or Shopping.com or Amazon.com in my SERP’s. They are a good way for me to compare different products and providers.

    The thing that bothers me is the MFA sites where it’s nothing but AdSense once you click to the site. I think that this move could be the first in putting an end to that, but my love of Google says that won’t ever happen, organic or PPC listings. They make too much money off of MFA sites.

    Anyway, good post!

  • http://sebastianx.blogspot.com/2007/03/why-ecommerce-systems-suck.html Sebastian

    Considering that so many eCommerce sites have used their search facilities to produce highly relevant spider fodder and to send link love to thin product pages this will soon be known as Google’s e-commerce penalty ;)

  • http://www.smart-keywords.com/blog.html AussieWebmaster

    This is going to be a think line to walk… what happens to people with Google Checkout? will they be less likely to have their products pages dropped?

    Using srver side code to change things is nothing new…

  • http://www.demib.dk Mikkel deMib Svendsen

    This is going to hurt big time on a lot of sites! I think we all know who they are … :)

    We will be covering this in more details at Strikepoint (WebmasterRadio.FM) this evening.

  • http://www.naffziger.net/blog Dave N

    Graywolf nailed the issue. There are tons of sites that create value by organizing the information on the web. Some make that information available through directories, others through search and others through resource guides (wikipedia).

    Theoretically, other websites are determining which pages are the most relevant resources by linking to them, and this is already considered in Google’s algorithm.

  • http://searchengineland.com/070312-104201.php littleman

    Thing is G’s algo favors the overwhelming link pop of the big feed sites like bizrate and shopping.com over the mid-size merchants who submit their feeds to them with the same exact product descriptions. So the merchants are screwing their natural results over by submitting to them.

    At the same time, all the comparison shopping engines have worked super hard to get as much of their content into the indexes as possible, so yeah, in effect its like arbitrage on a corporate scale with a shit load of link pop ensuring that they float above the merchants.

    I recently had a client who was getting killed in natural search because of this and I tried to convince them that the feeds are hurting their ranking potential and they just couldn’t accept it.

  • http://www.doolally.net doolally

    Could the same apply of tagged pages like
    http://searchengineland.com/guides/link_building.php
    As they are quite simular to search results pages?

  • http://sextoysinsider.com rlonghurst

    My company is a retailer that submits feeds to Shopping.com, Bizrate et al. It can be very frustating to see these sites’ seach results rank very highly in Google when the keywords that Google has ranked them for are the keywords in our product names and short descriptions.

    By dint of their sheer size – and, cynics would say, AdWords spending power – the sites are able to rank higher than the *original* source of their content.

    Yes, these sites do go some way to making sense of the Web for the shoppers, but what of the searcher who want to go straight to the merchant?

    Shouldn’t Google show merchant results first and then a pageful of indentkit Shopping.com, Pricerunner, Shopping.com, Bizrate and Froogle results?

    Is showing a screenful of near identical price comparison sites the best way of serving the user?

  • http://www.michael-martinez.com/ Michael Martinez

    As a consumer I WANT to see those product search pages show up in the query results. Screw the PPC ads — they never lead you directly to the exact products you need.

    What I don’t want to see in search results are search results that are irrelevant to my particular need. I don’t want to see MFA search scraper pages. I don’t want to see MFA pseudo search pages on parked domains.

    But if I want to find a particular DvD model and the only way it’s been crawled/indexed for a large shopping site is through an internal search result, then give me the internal search result.

    Otherwise, instead of investing their efforts in forcing people to police the Web for them (by threatening either online or behind the scenes at conferences to delist pages that don’t use REL=NOFOLLOW), Google can quickly and easily resolve this particular issue by supporting an actually useful meta tag, one that says, “Don’t return this page in results, but pick one of the sibling pages this page links to as appropriate for the query”.

    Then Google can pick the most appropriate page. That is, the algorithm says that internal search page X is relevant to the query, and X points to X-1, X-2, and X-3 — let Google pick one of those three.

    The solution is not to require Webmasters to do Google’s bidding under threat of penalty. The solution is for Google to remember that without the Web there is no Google.

  • dougsimms

    Yes, I’m sure it is in their best interest to eliminate all the query based pages on the web:

    http://www.google.com/search?hl=en&safe=off&q=+site:finance.google.com+google.com+inurl:q%3D

    This is a weak and shortsighted idea, Im surprised that it is coming from GOOG…

    I guess when you can just plug in the equivalent to yesteryear’s “Inside Yahoo! Matches” to make money on your own SERPs, everyone else gets the hose.

  • http://www.luckylester.com Lucky Lester

    Good read and a very good point regarding paid listings.

  • RayPays

    I am wondering when Google will stop displaying results that are just URL’s with Google adwords in several different configurations on sites that have no other content. Given that they make money on anything clicked on these type sites pages, my bet is, not any time soon.

  • Trogdor

    We’re a small business, and sometimes our internal-product-search URLs get spidered, and even get ranked. We’ve encouraged this, as it’s always seemed like another chance to get relevant pages ranked in the engines.

    Now, we’re supposed to stop?! Our product-search-result pages are indeed relevant for many of the queries we target.

    Over and over, Matt & Vanessa vaguely said, “result pages that offer little value to the visitor” … which gives them a nice out, because it’s all based on whether or not the pages have value. I’d like to think that our internal SERPs do indeed, but as Google is the big decider, it seems as though a useful SEO tool has been taken away from me.

    I see that in Matt Cutts’s post about it, there’s been no response about internal SERPs that are valuable / relevant, so we’re in this wonderful grey area.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide