Google Now Fills Out Forms & Crawls Results


One of the biggest search challenges has long been that the major search engines like Google cannot crawl material that can only be retrieved through the use of forms. Now Google is filling out those form to obtain the information previously hidden, the company has announced.

Google says that for the past few months, it has been filling in forms on a "small number" of "high-quality" web sites to get back information. What words has it been entering into those forms? Words automatically selected that occur on the site, with check boxes and drop-down menus also being selected:

In the past few months we have been exploring some HTML forms to try to discover new web pages and URLs that we otherwise couldn’t find and index for users who search on Google. Specifically, when we encounter a <FORM> element on a high-quality site, we might choose to do a small number of queries using the form. For text boxes, our computers automatically choose words from the site that has the form; for select menus, check boxes, and radio buttons on the form, we choose from among the values of the HTML.

Results returned are then crawled. Ironically, it was just over a year ago that Google warned against getting search results like these indexed. Now it’s actually generating and crawling those results itself.

Don’t want Google doing this to your site? Google says that if your form is blocked through robots.txt or meta robots instructions, those forms won’t be accessed. In addition, some other forms won’t be touched if they fit certain technical criteria:

We only retrieve GET forms and avoid forms that require any kind of user information. For example, we omit any forms that have a password input or that use terms commonly associated with personal information such as logins, userids, contacts, etc.

The move is potentially good for searchers, in that it will open up material often referred to being part of the "deep web" or "invisible web" as it was hidden behind forms. Search Engine Land executive editor Chris Sherman actually co-authored a book on the topic. He and fellow author Gary Price didn’t coin the term invisible web, but they certainly help popularize it.

It should be noted that Google’s not the first to do something like this. Companies like Quigo, BrightPlanet, and WhizBang Labs were doing this type of work years ago. But it never translated over to the major search engines. Now chapter two of surfacing deep web material is opening, this time with a major search player — in that, Google is being a pioneer.



Danny Sullivan is editor-in-chief of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also oversees Search Engine Land’s SMX: Search Marketing Expo conference series, maintains a personal blog called Daggle and can be followed on Twitter here.

See more articles by Danny Sullivan >


Share, Bookmark & Discuss This Article
More:


Keep Updated: News Via Email | News Via RSS Feed | News Via Twitter


See more stories like this in the Members Library! Check out the Google: SEO, Google: Web Search sections of the Members Library where this story is filed. Members also get access to exclusive video content, a members-only weekly & monthly newsletter, plus more. Check out all the benefits!

Comments are closed.


RECENT COMMNENTS

  • Eric Ward said " OK, so I was trying to be ironic/funny with my maniacal idea for aggregating all my personal comment"
  • JohnWEllis said " Greg, Thanks for sharing this data. No good comes from asking people what ads are “helpful”. People "
  • Shari Thurow said " Hi Nick- Yeah, I hate the schmoozers, too. I chose to ignore them because they don't usually last lo"

See All »


FREE DAILY SEARCH NEWS RECAP!

Stay on top of all the search news with our daily summary, the SearchCap newsletter. View a sample ›

STAY CURRENT THROUGHOUT THE DAY

RSS Feeds

The Search Engine Land feed keeps you informed as news happens. SEE ALL FEEDS »

Upcoming Search Engine Land Conferences

Advertise With Us »

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.


SMX Web Site » | SMX Difference » | SMX News »


Join us at an upcoming SMX event:

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:


See more webcast topics »

TRACK US SOCIALLY
Upcoming Search Engine Land Conferences

Get Your Search Engine Land
Premium Membership!

Become a premium member today and receive:

  • Express commenting privileges & photo.
  • Exclusive videos & newsletters.
  • Discounts to our SMX conferences.
  • Access to "How To" & Other Archives.

Learn More

Upcoming Search Engine Land Conferences
Add to GoogleAdd to My Yahoo!Add to BloglinesAdd to NetvibesAdd to Windows Live