Google Now Fills Out Forms & Crawls Results

One of the biggest search challenges has long been that the major search engines like Google cannot crawl material that can only be retrieved through the use of forms. Now Google is filling out those form to obtain the information previously hidden, the company has announced. Google says that for the past few months, it […]

Chat with SearchBot

One of the biggest search challenges has long been that the major search
engines like Google cannot crawl material that can only be retrieved through the
use of forms. Now Google is filling out those form to obtain the information
previously hidden, the company has announced.

Google says that for the past few months, it has been filling in forms on a
"small number" of "high-quality" web sites to get back information. What words has it been entering into those forms? Words automatically selected that occur
on the site, with check boxes and drop-down menus also being selected:

In the past few months we have been exploring some HTML forms to try to
discover new web pages and URLs that we otherwise couldn’t find and index for
users who search on Google. Specifically, when we encounter a <FORM> element
on a high-quality site, we might choose to do a small number of queries using
the form. For text boxes, our computers automatically choose words from the
site that has the form; for select menus, check boxes, and radio buttons on
the form, we choose from among the values of the HTML.

Results returned are then crawled. Ironically, it was just over a year ago
that Google warned
against
getting search results like these indexed. Now it’s actually
generating and crawling those results itself.

Don’t want Google doing this to your site? Google says that if your form is
blocked through robots.txt or meta robots instructions, those forms won’t
be accessed. In addition, some other forms won’t be touched if they fit certain
technical criteria:

We only retrieve GET forms and avoid forms that require any kind of user
information. For example, we omit any forms that have a password input or that
use terms commonly associated with personal information such as logins,
userids, contacts, etc.

The move is potentially good for searchers, in that it will open up material
often referred to being part of the "deep web" or "invisible
web
" as it was hidden behind forms. Search Engine Land executive editor
Chris Sherman actually

co-authored a book on the topic
. He and fellow author Gary Price didn’t coin
the term invisible web, but they certainly help popularize it.

It should be noted that Google’s not the first to do something like this.
Companies like Quigo,
BrightPlanet, and
WhizBang Labs were doing this
type of work years ago. But it never translated over to the major search
engines. Now chapter two of surfacing deep web material is opening, this time
with a major search player — in that, Google is being a pioneer.


Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.


About the author

Danny Sullivan
Contributor
Danny Sullivan was a journalist and analyst who covered the digital and search marketing space from 1996 through 2017. He was also a cofounder of Third Door Media, which publishes Search Engine Land and MarTech, and produces the SMX: Search Marketing Expo and MarTech events. He retired from journalism and Third Door Media in June 2017. You can learn more about him on his personal site & blog He can also be found on Facebook and Twitter.

Get the must-read newsletter for search marketers.