On the heels of a formal privacy complaint by former FTC employee Chris Soghoian targeting Google’s embedding of search queries in page URLs, a new class action lawsuit takes aim at the same thing. It seeks monetary damages and an injunction against Google’s alleged sharing of searches with third parties.
Google Said To Share User Queries
Filed in federal court (Northern District of California), the complaint asserts that Google’s shares queries in violation of Google’s own privacy policies as well as federal law and California state privacy and business practices statutes. The basic allegation of the complaint is the following:
Google has consistently and intentionally designed its services to ensure that user search queries, which often contain highly-sensitive and personally-identifiable information (“PII”), are routinely transferred to marketers, data brokers, and sold and resold to countless other third parties.
The user search queries disclosed to third parties can contain, without limitation, users’ real names, street addresses, phone numbers, credit card numbers, social security numbers, financial account numbers and more, all of which increases the risk of identity theft. User search queries can also contain highly-personal and sensitive issues, such as confidential medical information, racial or ethnic origins, political or religious beliefs or sexuality, which are often tied to the user’s personal information.
“Referrer” Data Under Fire
How exactly is Google sharing these queries? Through what’s known as “referrer” data, which tells a web site the page that someone came from.
At Google, in most cases the address (the URL) of the “page” that someone is viewing after doing a search contains the words they searched on. If they click from Google to then arrive at another web site via one of Google’s search listings, that web site is told what those words are (see the postscript below for more information on this and how technically it’s not Google sharing the information).
The suit argues is that disclosure of queries through referrers can lead to the identification of specific individuals, something that happened in a few limited cases in the release of AOL search data in 2006. This is the same argument at the heart of the recently filed FTC complaint.
Referrer data have recently come under closer scrutiny in a series of articles on tracking, targeting and privacy run by the Wall Street Journal. In particular privacy breaches and the sharing of personal information by Facebook apps were the subject of a WSJ article last week:
It’s not clear if developers of many of the apps transmitting Facebook ID numbers even knew that their apps were doing so. The apps were using a common Web standard, known as a “referer,” which passes on the address of the last page viewed when a user clicks on a link. On Facebook and other social-networking sites, referers can expose a user’s identity.
Google’s Response & Likely Response
Google provided a statement through a spokesperson in response to learning of the class action last night:
We have not yet received a copy of the complaint and won’t be able to comment until we’ve had a chance to review it.
Google will likely argue that there is no privacy breach because no personal information gets to third parties. Google was quoted in response to the earlier FTC complaint making that argument, which is also likely to be Google’s position and defense in the class-action suit:
Google said its passing of search-query data to third parties “is a standard practice across all search engines” and that “webmasters use this to see what searches bring visitors to their websites.” The statement added, “Google does not pass any personal information about the source of the query to the destination website.”
However in Google’s opposition to Bush Administration attempts in 2006 to gain anonymous search query data, Google was the only search engine that fought the government’s subpoena. The company made numerous statements that disclosure of search queries can lead to discovery of the identity of users. This “testimony” by Google is highly problematic if Google does defend with the argument that referrers don’t lead to the revelation of personal identity or information.
Google Offers Searching Without Referrers
Currently users can search on Google in a secure way without passing their queries to third parties, through Google’s encrypted web search.
This is not a frivolous case it would appear and, if successful, could ultimately result in a change of policy that impacts what gets shared between Google and the rest of the internet.
Postscript From Danny Sullivan
I’m hoping to take a deeper look at the case myself in the near future. I’ve only had a chance to skim through the case so far. But here are some preliminary thoughts.
What’s A Referrer?
First, please read The Death Of Web Analytics? An Ode To The Threatened Referrer. I wrote this earlier this year, and it explains exactly what referrer data is, how searches can end up embedded into it, and why it’s similar to Caller ID for telephones. It even explains why you see “referer” and “referrer” as two different spellings.
Next, Google’s best defense in this case will likely be that it is not “leaking” any data to third parties because technically, it is not. Google does not provide referrer information. Your browser does that.
The Real “Leaker” – Your Browser
Most browsers, as my article above explains, will report the last page you were viewing when you click on a link from that page to reach a new one. With Google, if you did a search for “cars,” then the URL of that search page would look (at its most basic) like this:
When you click on a site in the results that is listed for “cars” at Google, that Google URL above is captured by your browser and then reported to the web site that you go to from Google (technically, a much longer URL is captured, but it still contains what you searched for).
Google doesn’t force the referrer into the browser and make the browser report the data. The browser does that itself. If the browser makers wanted, they could eliminate referrer information. That would vastly reduce potential privacy risks across the entire web, rather than just at Google.
Targeting Google Doesn’t Eliminate Referrers
Assume that the lawsuit was successful, and Google was forced to strip search terms out of URLs. Referrers would still happen from other sites. Searches at Yahoo and Bing might still contain these. Similarly, someone visiting a porn site might not realize it contains invisible images that get loaded, causing their browser to send — via referrer data — the page that they are viewing. All that and more only stops if browsers stop sending referrers.
Unlike in the case of the aforementioned Wall Street Journal articles that have been targeting Facebook, Myspace and some others about “leakages” due to referrers, Google search referrers contain no “user ID” that can easily be traced back to an actual user. They do contain IP addresses, something that in very limited cases might be linked in to an individual.
IP Addresses Versus User IDs
Google Anonymizing Search Records To Protect Privacy is an in-depth article from me that explains more about IP addresses and what they can or can’t reveal. One European Union privacy official has argued that IP addresses are personal information, but Google has largely disagreed. Still, the company has been anonymizing its data — and doing so even before the EU was pushing for it (see Anonymizing Google’s Server Log Data — How’s It Going?).
So why is Google eliminating IP addresses if they aren’t personal? And why did it object to the US government wanting such “anonymous” search data back in 2006, if IP addresses aren’t personal?
The answer is that over a large period of time, IP addresses can become more personal. If someone’s IP address doesn’t change — and they’ve performed a number of queries (say 10, 20 or 100 to pick some numbers) that can be linked to that IP address (or some other common identifier), then you might be able to guess at their identity. It’s how the New York Times was able to guess at the identity of then 62-year-old Thelma Arnold after AOL’s data leak in 2006.
When it comes to referrers from Google, IP addresses will be associated with Google’s search strings. However, to really know who someone is, you’d have to capture a large number of Google referrers from a large number of web sites. It’s not impossible. It’s something that some behavioral targeting companies try to do in a different way by using cookies. But it’s currently far different than the idea that if you use Google search, clicking on a link reports exactly who you are, along with what you searched for.
Death To Referrers? Death To Caller ID?
As I wrote in my earlier The Death Of Web Analytics? An Ode To The Threatened Referrer piece, I have very mixed feelings that the referrer might go away. I’ve configured my own browser not to pass it on (I use No Referer). That’s primarily because sometimes I’m viewing private documents, such as site redesigns, or new search projects in development, where clicking from these places might reveal their locations.
As a marketer, referrer data is priceless. I’m not taking aggressive marketers trying to build up a profile of you and what you’ve done. I’m talking far less sophisticated marketing, just knowing exactly how people in general found your web site. What pages sent you traffic. What terms you ranked for. What ads worked or didn’t.
Referrers are part of the reason that internet marketing is so successful, because results are trackable. Without referrers, it becomes more like the offline world where people spend on marketing with relatively little (compared to the internet) insight as to what works.
The marketer in me doesn’t want to see this go. It certainly doesn’t want to see if go if Caller ID on regular phones isn’t switched off, by default. Huge arguments were fought over that battle in the US. Phones don’t have it switched on by default, despite the better privacy this would ensure. It’s not allowed at all when placing calls to free 800 numbers, in part because marketers argued that since they’re paying for the call, they should know who is calling.
Similarly on the internet, web site owners are effectively “paying for the call” to their web sites. Shouldn’t they get to know who is calling, too? Of course, the key difference is that in a call to a web site, it’s possible that the browser will initiate other calls that the user doesn’t realize is happening.
This case itself has all the feel of lawyers deciding they’d like to make a lot of money fast, with no real concern for privacy or the users involved (as happened with the recently settled Google Buzz lawsuit, nearly $3 million to the lawyers and $2,500 for each of the seven plaintiffs). But as Greg says, that doesn’t take away from the seriousness of the issue.
Should site owners find ways to prevent referrers from being sent? Is it ultimately the browser-makers who should do this? Is there a way that referrers can “grow up” in the face of new privacy concerns and really be shared only with the actual web site that someone visits. And if Google is forced ultimately to kill embedded search queries, will it improve existing tools such as Google Webmaster Central where search queries are reported in a more limited but more anonymous format.