A new research paper from Yahoo and Cornell University — with search legend Jon Kleinberg as one of the coauthors — provides a fascinating look at how a search query such as "red sox" or "hurricane deal" can be centered around a physical location – including one that changes over time.
The paper — Spatial Variation in Search Engine Queries — made use of Yahoo query logs to see if queries could be tracked back to particular areas. Each person doing a query has an internet IP address. Those IP addresses (with some filtering done to deal with people using the same IPs) were mapped, so that each query was linked to a point on Earth (or specifically to North America, the region covered in this study).
The image above shows an example of this. Queries for [red sox] happen across the US but occur with the most frequency (shown in red) around Boston, home to the Red Sox.
Similarly, other sports team queries center around the various cities that host those teams:
One of the most interesting parts of the paper was how the "center" of a query can move. Consider this illustration of searches for [hurricane dean]:
The chart shows how the center of the queries moved almost in line with where the actual storm headed. OK, so how can the center of these queries be in water? Who’s searching in the middle of the ocean? My assumption (the paper isn’t clear here) is that you have people along the various coasts that were searching – and so the center of all these searches sometimes mapped to being between the coasts.
Another type of localized query that can be mapped are "distinctive queries" that occur in high frequencies or fairly uniquely to certain areas. The map below shows some of these, such as [gilroy dispatch] happening around the Gilroy area:
All this mapping of queries is fun and interesting, but can it improve search? Usually, the challenge has been to know what web pages match a particular area, not which queries.
The paper doesn’t provide any concrete suggestions in its conclusion. But there are a number of ways I can see it being helpful. IP detection isn’t perfect — but if you can tell that only certain queries tend to come from certain areas, then that might help search engines better target local information to someone with an IP address that can’t be depended upon for localization.
Knowing the "centers" of queries might also help search engines better understand what "centers" should be used when mapping results. A local query using a city name often ranks results based on those closest to the geographic center of a city. But if query mapping shows a different center, perhaps that could be used.