Google Responds To EU: Cutting Raw Log Retention Time; Reconsidering Cookie Expiration

In response to an EU letter over data retention, Google has announced that it will now anonymize server log data after 18 months, rather than the previous maximum time it had announced of 24 months. It is also reconsidering how long its cookies last. It’s nice to see Google make such a fast, responsive move, […]

Chat with SearchBot

In response to an EU letter over data retention, Google has announced that it will now anonymize server log data after 18 months, rather than the previous maximum time it had announced of 24 months. It is also reconsidering how long its cookies last. It’s nice to see Google make such a fast, responsive move, though it is reacting to something that felt more like a political show rather than a real effort to improve privacy protection. The EU privacy group that sent the letter can feel it got a result. Google can look like it perhaps protected privacy more, but some important core issues remain unresolved — though Google shines hope on the idea of a digital dashboard or console to put users directly in control of all their private data with the company. And will a new privacy summit being called for happen?

Google Announced Log Anonymizing Program

To understand today’s announcement, let’s go back in time. Google Anonymizing Search Records To Protect Privacy from me covers how back in March, Google announced that it would make server log records anonymous after a period of 18 to 24 months. That article goes into detail about what server log data does — and does not — record that could be personally identifiable. Some key issues to understand from this original announcement:

  • Server Logs Versus User Database: Separate from server log data, Google also maintain a registered user database. Anyone who voluntarily enrolls in a Google service (something requiring a Google Account) will have information stored in this database. FYI, other companies such as Microsoft and Yahoo also have similar databases.
  • User Database More Sensitive: The registered user database has far more personally identifiable information, such as stored web and search histories, that can be obtained from server logs. This information is not automatically destroyed or made anonymous after a set period of time. Instead, at the moment, it is only destroyed when the user voluntarily decides to close an account or delete the data.
  • Length Of Time To Meet Legal Requirements: Some groups previously concerned that Google kept data for an unlimited amount of time were still not happy with a period of up to two years. But Google’s explanation was this time period was as short as allowed by some laws. More on that in a bit!

EU Asks Questions On Data Retention

In April, news came out (see EU Group May Serve Google With Letter Over Data Retention Policies) that the European Union’s Article 29 Data Protection Working Party was going to send Google a letter over its data retention policies. At the end of May, the letter was finally sent (see European Union Questions Google’s Data Retention Policy).

From the letter (PDF format), the letter outlines the working group’s concern over log data retention:

It is of the opinion that the new storage period of 18 to 24 months on the basis indicated by Google thus far, does not seem to meet the requirements of the European legal data protection framework.

The Article 29 Working Party is concerned that Google has so far not sufficiently specified the purposes for which server logs need to be kept, as required by Article 6 (1) (e) of Data Protection Directive 95/46/EC. Taking account of Google’s market position and ever-growing importance, the Article 29 Working Party would like further clarification as to why this long storage period was chosen. The Working Party would also be keen to hear Google’s legal justification for the storage of server logs in general….

Concerning the “google cookie”, the lifetime of this cookie, which has a validity of approximately 30 years, is disproportionate with respect to the purpose of the data processing which is performed and goes beyond what seems to be “strictly necessary” for the provision of the service.

Things The EU Should Have Done, Should Have Known

I find the letter fairly amazing from a group that supposedly is concerned about privacy, in how it fails to ask any substantial questions and suggests, frankly, technical ignorance. It simply feels motivated out of political posturing. Let me count the reasons, some of which I’ll revisit in more depth as part of Google’s response to the EU letter further below.

  • Where Are Letters For Microsoft & Yahoo?: Both Microsoft and Yahoo retain server logs and for longer than Google, in that when last publicly surveyed after Google’s announcement, they gave no time period for destruction (I’ve emailed them for a further update, in case things have changed in three months). In contrast, Google pulls back on how long it keeps data and becomes a target, as a results. Microsoft and Yahoo, which have market positions that rival an in some cases outdistance Google, get ignored?To be fair, the letter to Google suggests that the Working Group isn’t happy with Microsoft and Yahoo, praising Google in its opening by saying: “Google’s ongoing engagement with the data protection community on a range of issues and in particular its readiness to consult with it in contrast with a relative lack of engagement by some of the other leading players in the search engine community.”

    Yet, by sending this letter to Google only — rather than sending a slightly different letter to all the major search engines that would have addressed the same issues across the board — the group rewards Google with headlines about how it is effectively being knuckle-rapped over privacy.

  • Google Keeps Data Up To 24 Months Because The EU Tells It To: Some EU members may require companies to retain data up to two years, due to a law passed at the end of 2005. Complying with legal requirements was of the reasons for the time period Google used, as it explained in a FAQ (PDF format) about the change. All this seems pretty easily knowable by the Working Group, and asking about it feels like a bit of written theater.
  • What About User Database Info?: Earlier, I explained that user databases have far more sensitive and personally identifiable information than server logs. But it is server logs that are being asked about. If the Working Party was serious about protecting privacy, it would have skipped right past the questions about server logs and asked for more guidance on how long user databases are held. For example, if an account goes idle, with the information be held indefinitely?
  • Everyone Has Server Logs!: The letter asks for Google to justify storing server logs in general, as if it is somehow odd or even ominous that someone would keep server logs. In reality, it is routine for web servers to log activity. In fact, I would be amazed if the server hosting the Working Group’s own pages aren’t logging visits. Of course, I don’t know this for certain because the European Commission’s web server legal notice page (which covers its privacy information) makes no mention about whether I’m logged (almost certainly), how long those logs are kept (perhaps indefinitely) or informs me I might be served a cookie (I was, when I went to the home page).

Cookie, Smookie!

I need to step out of the bullet points to best dive into my disgust over the cookie issue coming up in this letter.

Daniel Brandt of Google Watch deserves credit as being the main person to scream out against Google’s 30 year cookie (which would be longer, but I believe it was originally set to the maximum safe time according to the Year 2038 bug). But when I interviewed him back in 2003 about this, even he wasn’t that worried about the time period, ultimately. From what I wrote back then:

In conclusion, don’t be worried that Google’s cookie won’t expire for 35 years. Even Brandt agrees that’s not the issue. He just doesn’t like the unique ID portion of the cookie.

“Getting rid of the unique ID is the most important thing. The expiration date is a second indicator of how sensitive they are to privacy issues, even without the unique ID. But the expiration date issue is close to trivial once the unique ID is gone,” Brandt said.

Here we are four years later, and the Working Group is acting like it simply reacts to cookie scaremongering rather than understands that cookie length means little when no one has a computer that lasts 30 years. Sure, it would be nice to see the cookie expiration shortened. But it means little in terms of real data protection.

Comparing Cookies

Still, let’s try something. I took Internet Explorer 7, cleared everything out of it and then changed my settings to get prompted for each cookie requested. Then I did a tour of search engines.

First Yahoo. Visiting the home page gave me get four different cookies, with these expiration dates:

  1. Tuesday, June 02, 2037 8:00:00 PM
  2. End of session
  3. Wednesday, August 19, 2015 4:00:00 PM
  4. Monday, July 09, 2007 11:29:12 PM

Then I did a search, which caused me to get two more cookies:

  1. Wednesday, August 19, 2015 4:00:00 PM
  2. Monday, June 11, 2007 11:29:46 PM

Next, Microsoft’s Live.com search engine. It gave me double the number that Yahoo did, to reach the home page, eight cookies in all. Expiration dates:

  1. End of session
  2. End of session
  3. End of session
  4. End of session
  5. End of session
  6. Monday, October 04, 2021 7:00:00 PM
  7. Monday, October 04, 2021 7:00:00 PM
  8. Monday, October 04, 2021 12:00:00 PM

After entering a search request, I got six more:

  1. End of session
  2. Monday, July 20, 2015 11:59:59 PM
  3. Monday, June 11, 2007 11:59:06 PM
  4. Monday, July 20, 2015 11:59:59 PM
  5. Thursday, June 08, 2017 11:39:06 PM
  6. Monday, June 11, 2007 11:59:06 PM

Next, Google. I received only two cookies:

  1. Sunday, January 17, 2038 7:14:07 PM
  2. Sunday, January 17, 2038 7:14:07 PM

That’s it. No more when I did a search.

Yes, Google has the longest lasting cookie, but barely. Look at the longest period of time for each service, which I’ve rounded roughly:

  1. Google: 30.5 Years (January 17, 2038)
  2. Yahoo: 30 Years (June 02, 2037)
  3. Microsoft: 14 Years (October 04, 2021)

Basically, both Google and Yahoo have 30 year cookies. So where’s the letter for Yahoo from the Working Group? And isn’t 14 years from Microsoft excessive? As I said, I think focusing on the time period of a cookie is just scaremongering and not diving into more substantial and important issues. But if Google’s going to get called out, why aren’t the others?

Google’s Response To The EU

But enough of my reaction to the Working Group’s letter. Today, Google’s given its own response in an open letter (PDF), linked from a blog post called, How long should Google remember searches?

Why keep server data? Google responds:

  • Analyzing log data is an important tool to help our engineers refine search quality and build helpful new services. Take the example of Google Spell Checker. Our spell-checking software automatically looks at your query and checks to see if you are using the most common version of a word’s spelling. If it calculates that you’re likely to generate more relevant search results with an alternative spelling, it will ask “Did you mean: (more common spelling)?” We can offer this service by looking at spelling corrections that people do or do not click on. Similarly, with logs, we can improve our search results: if we know that people are clicking on the #1 result we’re doing something right, and if they’re hitting next page or reformulating their query, we’re doing something wrong. The ability of a search company to continue to improve its services is essential, and represents a normal and expected use of such data.
  • Log data is also crucial in helping prevent fraud and abuse. It is standard among Internet companies to retain server logs with IP addresses as one of an array of tools to protect the system from security attacks. For example, our computers can analyze logging patterns in order to identify, investigate and defend against malicious access and exploitation attempts. A failure to retain log data for a sufficient period would make our systems more vulnerable to security attacks, putting the personal data of our users at greater risk. Historical logs information can also be a useful tool to help us detect and prevent phishing, scripting attacks, and spam, including query click spam and ads click spam. Moreover, log data helps us protect our systems from web and index spam, which in turn supports healthy traffic flow to many web sites on the Internet.

As part of the general blog post, it also succinctly lists:

  • to improve our search algorithms for the benefit of users
  • to defend our systems from malicious access and exploitation attempts
  • to maintain the integrity of our systems by fighting click fraud and web spam
  • to protect our users from threats like spam and phishing
  • to respond to valid legal orders from law enforcement as they investigate and prosecute serious crimes like child exploitation; and
  • to comply with data retention legal obligations.

Why Keep It 24 Months?

Moving along, Google provides a variety of reasons for why the 24 month maximum period was initially selected:

We need to have a sufficient amount of historical log server data. In fact, all search engine companies need sufficient data to evaluate and improve their services based on the needs of users, as online services evolve very rapidly. In addition, there is tremendous growth in fraud on the Internet, posing serious challenges for service providers to keep their services secure. In determining a retention period, we closely examined the evolution of search engine services, and the needs of our engineers to ensure the security of Google services. The period chosen, 18 to 24 months, represents a period lengthy enough to achieve these purposes without being excessive. We therefore believe that this is a proportionate period for the retention of log server data.

In addition to proportionality, data retention policies must also respect the principle of legality set forth in Article 6(1)(a) of the General Data Protection Directive. The Data Retention Directive requires all EU Member States to pass data retention laws by 2009 with retention for periods between 6 and 24 months. Google is therefore potentially subject (both inside and outside the EU) to legal requirements to retain data for a certain period. Since not many Member States have implemented the Directive thus far, it is too early to know the final retention time periods, the jurisdictional impact, and the scope of applicability. Because Google may be subject to the requirements of the Directive in some Member States, under the principle of legality, we have no choice but to be prepared to retain log server data for up to 24 months.

Problems With The EU Data Retention Law

You can see the issue of the EU data retention law comes up. Google then goes into some interesting depth of the problems of trying to figure out how exactly this is applied, and to whom:

There are many unanswered questions regarding the EU Data Retention Directive. The Working Party has criticized its lack of clarity in many respects, particularly with regard to divergent implementations in each Member State. We would welcome a definitive debate across Europe to answer such basic questions as:

1) What is an “electronic communication service provider” subject to data retention obligations, and would it include Google services, such as Gmail, Google Talk, or Google Search, in light of different definitions in each Member State?

2) What is the binding retention period for a global Internet company doing business in each Member State, when retention periods range from 6 to 24 months?

3) Do data retention requirements apply to the storage of personal data outside the EU by service providers established in the EU? 4) Will EU Member States go beyond the Directive and implement more stringent retention requirements?

For example, the German Ministry of Justice has proposed that webmail providers should be required to verify the identity of their account holders. Would the German authorities attempt to apply that requirement to Google? Could we challenge its legality in court, either as an unconstitutional infringement of privacy, or as an example of jurisdictional over-reach?

In short, there is tremendous confusion in legal circles across Europe on these issues, and both individuals and companies would benefit from greater clarity from authorities responsible for the Data Retention Directive to answer these very fundamental questions. A public discussion is needed between officials working in data protection and law enforcement to resolve these issues.

Complying With Other Laws

Google also notes that it is subject to laws outside the EU and interestingly works in an argument that it might want to retain data to help law enforcement:

It is also important to remember that in the U.S., the Department of Justice and others have similarly called for a 24-month data retention period. Thus, there seems to be an emerging international consensus on 24 months as the outer limit for data retention. This period makes sense for a global company like Google that must comply with the laws of all countries where it does business. Regardless of data retention requirements, logs are an important tool for law enforcement to investigate and prosecute many serious crimes, such as child exploitation. While we have resisted excessive requests from governments in the past, we believe that it is our responsibility to respect law enforcement requests for logs information when law enforcement follows valid legal process. Once again, a reasonable balance needs to be struck between the goals of privacy and the legitimate goals of law enforcement.

In addition, data protection laws, such as Article 17 of the General Directive and Article 4 of the E-Privacy Directive, require companies to ensure that adequate security measures are taken to protect user data. As explained above, our systems engineers require a sufficient historical sample of log server data in order to analyze security threats. A period of 18 to 24 months provides our engineers with sufficient data to analyze these threats without being excessive.

Of course, other laws also impose obligations on companies to retain information. In the U.S., for example, the Sarbanes-Oxley law requires us to retain business records sufficient to establish adequate financial and other controls. The same is true of tax and accounting requirements, especially for paid services, such as clicks on sponsored links, where we have a contractual and accounting obligation to retain data, at a minimum until invoices are paid and the period for legal disputes has expired. These legal obligations must also be considered in connection with our server log retention policies.

Shortening To 18 Months & Reconsidering Cookie Expirations

As for cookies, Google writes:

We believe that cookies data management in a user’s browser is fundamentally a browser/client issue, not a service/server issue. Therefore, the lifetime of a cookie does not indicate or imply any enforcement of data retention. We also believe that cookie lifetimes should not be so short as to expire and force users to re-enter basic preferences (such as language preference). Nonetheless, we acknowledge that cookie lifetimes should be “proportionate” to the data processing being performed.

The real kicker, of course, is Google concluding that it will shorten the retention period and reconsider how long cookies last:

After considering the Working Party’s concerns, we are announcing a new policy: to anonymize our search server logs after 18 months, rather than the previously-established period of 18 to 24 months. We believe that we can still address our legitimate interests in security, innovation and anti-fraud efforts with this shorter period. However, we must point out that future data retention laws may obligate U.S. to raise the retention period to 24 months. We also firmly reject any suggestions that we could meet our legitimate interests in security, innovation and anti-fraud efforts with any retention period shorter than 18 months. We are considering the Working Party’s concerns regarding cookie expiration periods, and we are exploring ways to redesign cookies and to reduce their expiration without artificially forcing users to re-enter basic preferences such as language preference. We plan to make an announcement about privacy improvements for our cookies in the coming months.

Lest the EU or other privacy groups try to jump in and claim credit if the cookie gets reduced, a reminder again that the attention on cookie length was sparked all those years ago by Daniel Brandt, who coincidentally and before this announcement from Google remarked in a privacy discussion on Googler Matt Cutts’ blog:

How about drastically trimming back on that cookie that expires in 2038? That would impress me as a symbolic gesture of good will. It was that cookie that first alerted me to the fact, way back in year 2000, that Google was going to be a problem when it came to privacy. I was right.

Beyond The Obvious

Yesterday in my Google Bad On Privacy? Maybe It’s Privacy International’s Report That Sucks article, I spent a considerable amount of time being upset with Privacy International for doing what I thought was a slipshod report on privacy. Today, I’m similarly critical about the EU move. It’s not — as I said yesterday — that I’m a Google fanboy that thinks it can do no wrong. In fact, it’s the opposite — I think Google as well as all the major search engines (and big companies for that matter) to have outside privacy groups and governmental bodies keeping them honest. My upset is that both Privacy International and the EU have seemed more concerned with style than substance.

As I pointed out in my article yesterday, Google has a variety of privacy policies that cover a range of services that it offers. These services can have data well beyond what’s in server logs, and it’s difficult for me — someone who regularly writes about Google — to know what happens with my data. Consider the accounts I have:

  • Google AdWords, with associated billing and ad campaign info
  • Google AdSense, with associated payments and information logged from traffic on my sites
  • Gmail, with mail going back for four years
  • Google Web History, with my search data
  • Google Analytics, with site activity data stored
  • Google Calendar, with a list of my activities

That’s just some of my accounts. If I delete my web history, I know that data is destroyed, though what’s kept on offline archives currently is not destroyed, from what I was last told. If I go to the Gmail privacy FAQ (far more useful than the Gmail privacy policy, which fails to link directly to the FAQ), I’m told deleting my mail really deletes it, even off backups, though that might take time. But then again, are these deleted from online backups only? What about offline?

The Privacy Control Panel

Figuring out where all my data resides and how to kill it is a pain — at Google or Microsoft or Yahoo, for that matter. John Battelle had a good suggestion back in early 2006 for a sort of private data control panel that could show you exactly what was stored where and put the user in control:

I bet 95% of the public will never edit, or even view the data more than once. But the sense that the control panel is there, just in case, will be invaluable to establishing trust.

We could use that more than ever. Google especially could use that, if it wants to stop the privacy attacks or at least stem them. How about it? I asked Google’s global privacy counsel Peter Fleischer about this yesterday, when talking to him about the Privacy International survey.

“We’re thinking hard internally along the digital dashboard-type of approach. Is there a way to give users a dashboard and visibility to all these elements and give them control,” he said. “It would be hugely complicated to build, but in terms of that vision, I completely share it, and we’re having deep discussions about it.”

As for Privacy International, it has now come out with a call for a privacy summit to be held on July 23 in San Francisco:

Following the recent publication of its consultative privacy rankings, PI has called on the major Internet companies to meet with the organization in July in San Francisco. The meeting has been called to clarify a number of data handling practices and is seen by PI as the first step to achieving an accord that will provide customers with consistent and strengthened privacy protections, and to give companies a greater understanding of the key challenges.

The meeting has been called for the week of 23rd July in San Francisco. Privacy International will reach out to all major Internet organizations with invitations to the event. These will be sent by Tuesday 12th July, 12.00 EST. We will then publish the full list of invited organizations together with a status report on their responses to the invitation.

A wide-ranging summit is a good idea, but after the publicity stunt Privacy International effectively pulled yesterday, it seems wrong they are the ones to set the date of when, where and hold out a status report of invitations accepted as a name-and-shame attempt. If they had been serious about this, they would have never published that inept report in the first place, causing them to lose credibility. Instead, they would have called for exactly this type of summit, perhaps with other organizations, and not polarized the situation even more.

Yes — let’s see that privacy summit happen — and soon. Yes — involve the group. But no, they aren’t the right group to be leading it now, nor setting the terms.

Postscript: I went back to Fleischer and asked if Google would take part in what PI suggests. He said:

Google is always open to a thoughtful dialogue with people who care about privacy. We have not received an invitation to this event yet, even though it has been reported publicly. So, at this stage, we cannot evaluate whether this would be a forum for a thoughtful exchange of views, or a publicity stunt.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Danny Sullivan
Contributor
Danny Sullivan was a journalist and analyst who covered the digital and search marketing space from 1996 through 2017. He was also a cofounder of Third Door Media, which publishes Search Engine Land and MarTech, and produces the SMX: Search Marketing Expo and MarTech events. He retired from journalism and Third Door Media in June 2017. You can learn more about him on his personal site & blog He can also be found on Facebook and Twitter.

Get the must-read newsletter for search marketers.