Subscribe Via Web Feed Subscribe with Google Add to My Yahoo! Subscribe with Bloglines Add to netvibes Subscribe with Live.com

« Yahoo Adds Much More Imagery To Maps | Main | SearchCap: The Day In Search, April 11, 2008 »

Apr. 11, 2008 at 2:00pm Eastern by Danny Sullivan

Google Now Fills Out Forms & Crawls Results

One of the biggest search challenges has long been that the major search engines like Google cannot crawl material that can only be retrieved through the use of forms. Now Google is filling out those form to obtain the information previously hidden, the company has announced.

Google says that for the past few months, it has been filling in forms on a "small number" of "high-quality" web sites to get back information. What words has it been entering into those forms? Words automatically selected that occur on the site, with check boxes and drop-down menus also being selected:

In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn't find and index for users who search on Google. Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML.

Results returned are then crawled. Ironically, it was just over a year ago that Google warned against getting search results like these indexed. Now it's actually generating and crawling those results itself.

Don't want Google doing this to your site? Google says that if your form is blocked through robots.txt or meta robots instructions, those forms won't be accessed. In addition, some other forms won't be touched if they fit certain technical criteria:

We only retrieve GET forms and avoid forms that require any kind of user information. For example, we omit any forms that have a password input or that use terms commonly associated with personal information such as logins, userids, contacts, etc.

The move is potentially good for searchers, in that it will open up material often referred to being part of the "deep web" or "invisible web" as it was hidden behind forms. Search Engine Land executive editor Chris Sherman actually co-authored a book on the topic. He and fellow author Gary Price didn't coin the term invisible web, but they certainly help popularize it.

It should be noted that Google's not the first to do something like this. Companies like Quigo, BrightPlanet, and WhizBang Labs were doing this type of work years ago. But it never translated over to the major search engines. Now chapter two of surfacing deep web material is opening, this time with a major search player -- in that, Google is being a pioneer.

Like The Story? Vote For It On Yahoo Buzz!
Subscribe To Our Daily Search News Recap!
Your Email:
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds)
Subscribe To Our Search Feed!
Subscribe Via Web FeedSubscribe with GoogleAdd to My Yahoo!Subscribe with BloglinesAdd to netvibes
Subscribe with Live.comSubscribe in NewsGator OnlineSubscribe in RojoAdd to My AOL
Share & Bookmark This Story!
By Danny Sullivan Permalink Jump To Comments See Related Stories In: Google: SEO, Google: Web Search



Reader Comments

Search:

Search Marketing Expo

Save the date for:
SMX Madrid (in Spanish, May 20-21)
SMX Advanced - Seattle, WA (June 3-4) Register today!
SMX Local & Mobile - San Francisco, CA (July 24-25) See the agenda, and register now!
SMX East - NYC - (Oct. 6-8)
SMX London - November 4 & 5, 2008

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts:

Most Recent News Posts

About Search Engine Land

Stay Updated!

Get Our Search Newsletters:
Email:
Daily Monthly

Get Our Search Feed:
Subscribe Via Web FeedSubscribe with Google
Add to My Yahoo!Subscribe with Bloglines
Add to netvibesSubscribe with Live.com
Subscribe in NewsGator OnlineSubscribe in Rojo
Add to My AOL
More About Our Feeds & Newsletters

Add to Technorati Favorites

Track Us Socially:
Facebook: Our Search News App
Facebook: Search Engine Land Page
Facebook: Search Engine Land Group
Flickr: Search Engine Land
LinkedIn: Search Engine Land Group
Twitter: Search Engine Land Feed

Bragroll