Google has announced an initiative with state agencies in Arizona, California, Utah and Virginia to help expose government information to web search engines. Often, government information is stored in database systems that are difficult if not impossible for search engine crawlers to access and index.
Google is working with technologists from the state agencies to help surface this invisible or deep web content, using a simple yet elegant approach using the sitemaps protocol, thereby allowing Google or any other search engine to discover and index government information.
Search engine crawlers rely on links to find content on the web. Much of this content is static, stored as pages on web servers. By contrast, databases display content dynamically, responding to user queries and commands. Since crawlers can’t type, it’s difficult for search engines to access content in a database.
However, most web pages displayed by a database have a unique URL. If this URL is saved as a link, search engine crawlers can effectively follow the link and see the same content a human user would—and index the content of the page.
This is where sitemaps come into play. By using the sitemap protocol to simulate queries to a database, the search engine can get around the barriers normally posed by dynamic content.
The sitemaps themselves are not indexed, so these collections of URL strings will not surface in search results. Instead, searchers will see search results based on full-text indexing of database content.
The approach is elegant because “Google’s not doing anything other than our typical approach to crawling,” said J.L. Needham, manager of public sector content partnerships.
Needham said that the amount of government information accessible now is relatively limited, but that Google plans to continue working with government agencies, eventually surfacing millions of pages of previously hidden content. Among the types of information searchers can find currently are job postings provided by Utah’s Department of Workforce Services, colonial history resources provided by the Library of Virginia, info on education and health services in California, and profiles of real estate professionals from the Arizona Department of Real Estate’s database of licensed agents.
Needham emphasized that these efforts were focused strictly on publicly available information, not private or personal records maintained by state governments.
Google is also helping these state governments beef up their site search tools, using the free Google Custom Search service to create customized search tools for users.
Needham said Google welcomes the opportunity to work with other government agencies to make their information repositories more accessible. “We would be happy if government spent a bit more time on SEO—the focus is almost entirely on the web site and the search tool on the web site,” he said.
For information on how a government agency can make it easier to search for hard-to-find public information, visit http://www.google.com/publicsector.