Sign Up To Receive This Column Via Email:  


Google Offers Robots.txt Generator

Google's rolled out a new tool at Google Webmaster Central, a robots.txt generator. It's designed to allow site owners to easily create a robots.txt file, one of the two main ways (along with the meta robots tag) to prevent search engines from indexing content. Robots.txt generators aren't new. You can find many of them out there by searching. But this is the first time a major search engine has provided a generator tool of its own. It's nice to see the addition. Robots.txt files aren't complicated to create. You can write them using a text editor such as notepad with just a few simple comman [...]


SEOs Want The NOINDEX Tag To Not Show A Page In The Index

Matt Cutts of Google posted a blog entry asking SEOs how they want Google to handle the NOINDEX meta tag. If you use the NOINDEX meta tag now, Google won't show the page in any way in the Google index -- not even a "link only" listing. Matt asks SEOs if this is what they want and the poll currently shows us that yes, SEOs want it this way. Here are the current results, but the results may change over the course of the week: How should Google treat the NOINDEX meta tag? 240 say "Don't show a page at all." 24 say "Find some middle ground." 23 say "Show a link to the page." Google Explains [...]


Yahoo Search Weather Update & Support For X-Robots Tag

The Yahoo Blog issued a weather report for changes to rankings in Yahoo Search, along with news that they are now supporting the X-Robots-Tag directive -- a way to control indexing of content that cannot accept meta robots tags. Google also supports X-Robots, which gives webmasters the ability to define robots.txt like rules within http headers, as opposed to just the META data within HTML pages. Yahoo provided a few examples of how it can work: X-Robots-Tag: NOINDEX -- If you don't want to show the URL in the Yahoo! Search results. Note: We'll still need to crawl the page to see and apply [...]


ACAP Launches, Robots.txt 2.0 For Blocking Search Engines?

After a year of discussions, ACAP -- Automated Content Access Protocol -- was released today as a sort of robots.txt 2.0 system for telling search engines what they can or can't include in their listings. However, none of the major search engines support ACAP, and its future remains firmly one of "watch and see." Below, more about the how and why of ACAP. Let's start with some history. ACAP got going in September 2006, backed by major European newspaper and publishing groups that in particular felt Google was using content without proper permissions and wanting a more flexible me [...]


Robots.txt Study Shows Webmasters Favor Google; BotSeer Robots.txt Search Engine Released

The Pennsylvania State University conducted a study that showed webmasters favored Google over other search engines in terms of allowing access to their web sites. An associated BotSeer search engine that allows searching across a collection of robots.txt files was also released. The study looked at which robots or crawlers were listed in a web site's robots.txt file, and Google was listed more often than any other search engine. The paper is named Determining Bias to Search Engines from Robots.txt (PDF) (it may be slow, so here is a local copy) and showed some interesting details. The mos [...]


How Proxy Hacking Can Hurt Your Rankings & What To Do About It

Google Proxy Hacking: How A Third Party Can Remove Your Site From Google SERPs by Dan Thies gives us a detailed look at the serious dangers of proxy hacking. Dan's detailed article shows the history on how he discovered the issue. He then goes into why the hacking currently works in Google. Dan is eager to encourage the search engines to do something about the issue. But Dan has provided details on how to help protect yourself with the help of some friends. [...]


Google Enhances Webmaster Central’s Robots.txt Analysis Tool

The Google Webmaster Central Blog announced improvements they have made to the robots.txt analysis tool. The tool now recognizes all sitemap declarations and relative URLs. So now the tool will report the validity of all sitemaps URLs plus show data for relative URLs. In addition, Google has expanded the reporting to include data for not just the first problem encountered, like they did in the past. Now they also show all problems encountered, on multiple lines and itemized by line number. [...]


Google’s “Unavailable After” META Tag Now Live

Google's Dan Crow announced today that the unavailable_after META tag is now live and operational. Google To Add "Unavailable After" META Tag from about two weeks ago, explains in detail more about this tag and how it can be used. [...]


More Info On Google’s Unavailable After Meta Tag & New X-Robots-Tag In Header Support

Last week we reported that Google was to add an "Unavailable After" META Tag. Since then, we've spoke to Dan Crow of Google, who provided more information on how to use it, as well information on a new way to send robots blocking info within HTTP headers. The "unavailable_after" Meta tag will allow you to tell Google that a page should expire from the search results at a specific time. For example, if you have a page that you would like to be removed from the search results at 6pm EST on July 23, 2007, you would add the following Meta tag: <META NAME="GOOGLEBOT" CONTENT="unavailable_aft [...]


Google To Add “Unavailable After” META Tag

Getting Into Google by Jill Whalen reports Dan Crow, director of crawl systems at Google, saying that Google is releasing a new META tag named "unavailable_after." The "unavailable_after" tag will allow you to tell Google when Googlebot should no longer crawl that page. Jill explains that this tag comes in handy when you have a limited time offer promotional page, and on this page, the promotion will expire on a specific date. By using the "unavailable_after" tag, you can tell Google that they should not crawl this page, after the promotion expires. There are several practical scenarios fo [...]


Search Illustrated: Blocking Search Engines With Robots.txt

While most of the time we want search engine crawlers to grab and index as much content from our web sites as possible, there are situations where we want to prevent crawlers from accessing certain pages or parts of a web site. For example, you don't want crawlers poking around on non-public parts of your web site. Nor do you want them trying to index scripts, utilities or other types of code. And finally, you may have duplicate content on your web site, and want to ensure that a crawler only gets one copy (the "canonical" version, in search engine parlance). Today's Search Illustrated i [...]


Belgian Papers Back In Google; Begin Using Standards For Blocking

Belgian newspapers that sued Google to be removed from its index are now back in, having agreed to use the commonly-accepted blocking standards that they initially rejected as not being legal. Google and the group representing the papers, Copiepresse, have issued a joint statement. That's below, along with a look at how this is a victory for Google, which has had to settle a series of similar lawsuits through agreements. Let's start with the joint statement: Internet users interested in Belgian news and users of Google’s search engine may have noticed today that the websites of the Belg [...]


Yahoo Supports New Robots-Nocontent Tag To Block Indexing Within A Page

For over a decade, search engines have supported standards allowing you to prevent pages from being spidered or included within a search index. Today, Yahoo now supports a new twist -- a way to flag that part of your page shouldn't be included in an index. It's called the robots-nocontent tag. Many search marketers have long struggled with the problem that the "core" content of a web page -- the main body copy or article -- can often seemed drowned out from a text analytics perspective by all the clutter around the content. That clutter is often ads, navigational links, cross promot [...]


From The Isn’t It Ironic Dept: Google Product Search’s Results Show Up In Google

Remember how Google said recently that it might crack down on listings pages that are simply search results themselves? Reader Michael Nguyen dropped an email today to point out how, ironically, Google is now listing pages from its own Google Product Search service exactly as it has warned others not to do. OK, settle down back there, those of you having a chuckle. Embarrassing? Yes! Intentional? Almost certainly not. Let's take a look. Try a search for snake light, and you'll get this: See down there at the bottom? Two pages from Google Product Search showing up in the top results: I [...]


How Search Engines Handle The Nofollow Attribute

Loren Baker at Search Engine Journal has a nice write up on how the search engines handle the nofollow attribute now just over two years since it was introduced. Ask.com still does not follow the tag, so here are the takeaway for Google and Yahoo: Google won't follow the link, Yahoo will (note) Google and Yahoo won't pass link popularity for that specific link Google would hope that Wikipedia would not take such an "absolute approach" on the nofollow link attribute being applied so widely. Note From Danny: Google WILL follow nofollow links in the sense that if someone else links to a page [...]


Google Releases Improved Content Removal Tools

Google has rolled out new tools to help people quickly get content removed from its search engine. Those targeted at site owners allow for speedy removal of pages and cached copies of pages. Other tools allow those to request the removal of images or links to pages with personal information about themselves, in the right circumstances. More on the tools and various options are covered below. Site Owner Removal Options For site owners, the best way to keep content out of Google is by using the robots.txt or meta robots tag options. Either option can prevent pages from getting into Google or ge [...]


Up Close & Personal With Robots.txt

The Robots.txt Summit at Search Engine Strategies New York 2007 was the latest in a series of special sessions with the intent to open a dialog between search engines representatives and web site publishers. Past summits featured discussion on comment spam on blogs, indexing issues and redirects. The subject of this latest summit was to discuss the humble but terribly important robots.txt file. Danny Sullivan moderated, with panelists Keith Hogan, Director of Program Management, Search Technology, Ask.com, Sean Suchter, Director of Yahoo Search Technology, Yahoo Search, Dan Crow, Product [...]


Google Warning Against Letting Your Search Results Get Indexed

The days of doing a Google search that brings up results leading to search results from other sites are heading for a close. Matt Cutts, in his Search Results In Search Results post today, points out a change to Google's guidelines that shows a crackdown on this type of material may begin. More about what I'm talking about below, plus the question of whether Google should do the same with paid listings. Over time, more and more pages seem to show up in Google search results that are merely lists of search results from those sites. To illustrate this, consider a search for dvd playe [...]


Meta Robots Tag 101: Blocking Spiders, Cached Pages & More

Last week, I covered a new command for the meta robots tag -- one to prevent search engines from using Yahoo titles and descriptions. In doing that, a number of questions came up about the meta robots tag syntax itself. Google Webmaster Central has now posted "Using the robots meta tag," providing some clarity from Google. In addition, both Yahoo and Microsoft have also sent me information on using the tag. I'll run through what everyone says below, complete with charts for easy at-a-glance comparisons. The meta robots tag was an open standard created over a decade ago and designed initiall [...]


Yahoo Provides NOYDIR Opt-Out Of Yahoo Directory Titles & Descriptions

Yahoo! Search Support for 'NOYDIR' Meta Tags and Weather Update from the Yahoo Search Blog covers how at long last, you can now tell Yahoo to not use Yahoo Directory information to make a title and/or description for your web page listings. It also cover how Yahoo's currently doing a reindexing change that might impact rankings. More on that below, plus tips about also blocking the Open Directory information from being used for your pages and some possible conflicts with multiple robots tags. Sometimes pages are listed in both Yahoo's crawler-based search results and within its human-compil [...]


Get Our News, Everywhere!

 
  • Advertise With Us
 

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!


Click to watch SMX conference video

Join us at an upcoming SMX event:

North America

EMEA

APAC

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.

SMX Site » | SMX Difference » | SMX News »




 

Search Engine Land Periodic Table of SEO Ranking Factors

Get Your Copy
Read The Full SEO Guide