Microsoft To Anonymize Log Data; Calls For Industry Standards Along With Ask.com
First Google, then Ask.com and now Microsoft have jumped onto the privacy protection train. Late yesterday, Microsoft announced that it was going to follow Google’s lead and anonymize search log data after 18 months. In addition, it has partnered with Ask to call for an industry effort to develop privacy principles. There’s a tinge of […]
First Google, then Ask.com and now Microsoft have jumped onto the privacy protection train. Late yesterday, Microsoft announced that it was going to follow Google’s lead and anonymize search log data after 18 months. In addition, it has partnered with Ask to call for an industry effort to develop privacy principles. There’s a tinge of PR stunt in this, but if looking for a PR edge will get the search engines moving, I suppose that’s a necessary evil. Below, a look at what Microsoft is doing, the industry call and lots of perspective (plus a timeline) on how we got here.
From Microsoft’s press release, it is promising to make:
search query data anonymous after 18 months by permanently removing cookie IDs, the entire IP address and other identifiers from search terms. Microsoft will also work to give customers more control over what information it uses to personalize their online search experience. In connection with its efforts to support a common industry approach to privacy issues, Microsoft also announced that it will join the Network Advertising Initiative (NAI) later this year when it begins to offer third-party ad serving broadly.
As said, Microsoft’s move follows in the footsteps of Google. And, as I’ll explain, it’s also likely designed to help Microsoft efforts to target searchers even more personally with ads. But let’s do the timeline to date, before getting into more analysis:
- March 14, 2007: Google announces it will make log data anonymous after 18-24 months. Google Anonymizing Search Records To Protect Privacy from me provides detail about the technicalities involved, including how search history records may be stored outside of log data.
- April 19, 2007: Google’s efforts to reduce data retention ironically attract negative attention from an EU privacy committee that wonders if it is cutting enough. Rumors that Google will get a letter over data retention circulate. EU Group May Serve Google With Letter Over Data Retention Policies covers this more.
- May 25, 2007: News that the EU letter was actually sent comes out. European Union Questions Google’s Data Retention Policy covers how I view it more political theater that an actual privacy protection effort, especially since the letter demands information easily accessible from published sources.
- June 11, 2007: Privacy International jumped into the fray with a “report” finding Google to be the worse in privacy. My dissection of that report can be found in Google Bad On Privacy? Maybe It’s Privacy International’s Report That Sucks. As a PR stunt, it worked for Privacy International. As a meaningful, thoughtful look — well, it wasn’t.
- June 12, 2007: In response to the EU, Google announces log data will be reduced after 18 months. Google Responds To EU: Cutting Raw Log Retention Time; Reconsidering Cookie Expiration has more on this.
- June 21, 2007: Belatedly, the EU realized there are apparently search engines beyond Google that pose all the same privacy threats that it was so concerned about with the Big G. European Union To Question Data Retention Policies Of Other Search Engines covers this in more depth.
- July 16, 2007: Google announces cookies will expire after two years, rather than the previous 30 year period. Other search engines remain with cookies that last from 14 to 30 years. Google Shortens Cookie Expiration Date has more.
- July 19, 2007: Ask.com promises to anonymize data and also come up with a system to delete information at the time of searching, for those who opt-in to the AskEraser program. Ask.com To Launch AskEraser To Erase Search History & New Data Retention Policy explains that more.
- July 22, 2007: Microsoft announces its own anonymizing program plus a joint call with Ask for the search industry to establish standards.
- July 23, 2007: This is the week Privacy International called for an industry-wide meeting of major internet companies on privacy, promising a “status report” of those who accepted and declined (with the implication being that declining indications wrong-doing or something to hide).
The last point is a key one to me. Ask.com and Microsoft are either attending this meeting (hard to tell, since Privacy International has failed to post any further news about it), or they aren’t attending and decided they needed to arm themselves with a good PR spin ahead of it. [NOTE: See postscript below, on how this meeting was cancelled].
What I dislike is the joint call by two of the major search engines that likely could have involved Yahoo and Google as well. First that joint call. From the press release:
Microsoft Corp. and Ask.com, a wholly owned business of IAC (NASDAQ: IACI), today joined together in the commitment to call on the industry to develop global privacy principles for data collection, use and protection related to searching and online advertising. The companies will work with other technology leaders, consumer advocacy organizations and academics to come together and join them in working on the development of these principles, which could include developing and sharing best practices to provide more control for consumers….
Microsoft and Ask.com are proposing that leading search providers, online advertising companies and privacy advocates convene to engage in an active dialogue to discuss privacy considerations posed by the proliferation of online advertising and search. The goal of the dialogue is to determine ways that the industry can work cooperatively to define privacy principles that take these new considerations into account. The companies will provide an update on their progress in September.
Don’t get me wrong. I’ve long written about wanting an industry-wide approach to these issues, rather than the Google-centric attacks that have typically happened. From 2005:
Google’s balancing act from News.com revisits the well-trod path of Google as potential privacy threat. Personally, I would love to get beyond these “what Google might” do stories and more toward what the search engine industry itself ought to be doing in terms of protecting privacy, especially as everyone’s offering personalized search or search history features….
There are real concerns. I’m not dismissing these at all. There’s potential for both corporate and governmental abuse of search profiles. But what we need is less hype, less putting one player in a corner and more actual suggestions of things that everyone can implement.
And from earlier this year:
Meanwhile, I still firmly remain that the Google “problems” and “fears” could be seen and dealt with in an industry-wide way. As I wrote about personalized search, when I raised that warning flag about Google seeming to want too much:
So many companies today offline (banks, credit cards, loyalty cards, credit reporting agencies) hold much more information about me personally than Google does. And Google’s peers are doing much of what Google itself wants to do. Don’t solve the Google problem. Solve the problem, whatever it is, in a comprehensive way. If that means better privacy protection, then give it to me across the board.
So if Microsoft and Ask are making this happen, hallelujah. But here’s how these joint announcements typically go. The players look to exclude as many of the other players as possible until the last minute, or invite them after the fact, to keep the PR glory mostly to themselves. It could be that Google and Yahoo were asked to take part with plenty of time to review the planned press releases but decided not to. I sort of doubt that. Instead, they either weren’t asked or asked so close to the release date that there wasn’t enough time to give it any proper review (Microsoft and Ask would have had plenty of time). [NOTE: See postscript below; Google says it wasn’t asked].
That kind of sucks. An industry effort really should start on better terms. Ask, in particular, shouldn’t be playing this game. After being left out of prediscussions on things like nofollow or sitemaps, excluding Google and Yahoo perhaps might feel like sweet revenge, but privacy is too important for PR games.
Still, the PR game is working. It got Google moving earlier this year, and the rest of the gang is following behind, with consumers likely to benefit. As Between The Lines puts it:
Personally, I hope the effort to form an industry standard stalls. If there’s no standard search engines will try to outdo each other on the privacy front. If we’re lucky search histories on file will get closer to zero.
Aside from the PR front, there’s a crucial reason Microsoft wants to have industry standards on privacy. That’s to provide cover for getting even more personalized with your searches.
Microsoft probably does the most extensive search profiling of any of the major search engines. How can it tell advertisers they can target people by sex or age or other demographics? Because it actively profiles people who have registered with its services, to track and monitor what they search for. Last December, the Wall Street Journal did a look at this:
If someone types in “compare car prices” on Live Search, Microsoft’s computers note that the person is probably considering buying a vehicle. The computers then check with the list of Hotmail accounts to see if they have any information on the person. If they do, and an auto maker has paid Microsoft to target this type of person, the computer will automatically send a car ad when she next looks at a Microsoft Web page. As a result, people should see more ads that are of interest to them. “We know what Web sites they have visited and what key words they used,” says Mr. Dobson. “We can deduce what their interests are.” Microsoft says that in testing in the U.S., behavioral targeting increased clicks on ads by as much as 76%.
Unlike Google, there’s no Web History control panel that shows the searcher all this.
Microsoft is looking to do more targeting off its own properties, so part of today’s push is to let consumers opt-out. That’s good.
Microsoft will continue to implement new privacy features and practices as it continues to develop its online services and offer new controls that help users manage the types of communications they receive from Microsoft. For example, once the company begins to offer advertising services to third-party Web sites, it will offer customers the ability to opt out of the behavioral ad targeting by Microsoft’s network-advertising service on those Web sites. Microsoft also will continue to develop new user controls that will enhance privacy, such as letting people search and surf its sites without being associated with a personal and unique identifier used for behavioral ad targeting, and allowing signed-in users to control the personalization of the services they receive.
As part of this, Microsoft is joining the Network Advertising Initiative, an industry organization I’d never heard of before, but it has some big names and a charter to develop industry standards on data collection. Microsoft has also posted new privacy principles (PDF format) for Live Search, which in addition to anonymizing data and offering opt-outs, also covers things like:
We will store our Live Search service search terms separately from account information that personally and directly identifies the user, such as name, email address, or phone numbers (“individually identifying account information”). We will maintain and continually improve protections to prevent unauthorized correlation of this data. Moreover, we will ensure that any services requiring the connection of search terms to individually identifying account information are offered in a transparent way with prominent notice and user consent.
Overall, I’m pleased and happy with Microsoft’s move. I’m somewhat amazed by some of the reaction that Microsoft has somehow done one-upmanship with Google in all this, however. Microsoft is behind on when it promises to expire cookies; it is matching what Google started months ago in terms of anonymizing logs. The opt-out is good, as is the industry call — I suppose that’s one-upmanship in making Google have to react. But as I said, it’s not a good way to start a cooperative industry effort.
Some remaining bits and pieces. Yahoo will likely want to join in, given that it already does extensive search profiling as well, which is to expand as part of its SmartAds program. Google, of course, is already taking into tons of data. But Google’s Web History system also puts the service the furthest along in providing real control for consumers over the data that really matters — not the fairly anonymous log info that’s linked to cookies but actual search profiles that are NOT covered as part of any of these announcements.
I’d really encourage you to read my Google Responds To EU: Cutting Raw Log Retention Time; Reconsidering Cookie Expiration article to understand this more, how it’s not the log data that most people need to worry about. I’m glad that’s getting addressed, but far more personal data is locked away in separate databases that consumers really don’t have much control over. As I wrote in that article:
Figuring out where all my data resides and how to kill it is a pain — at Google or Microsoft or Yahoo, for that matter. John Battelle had a good suggestion back in early 2006 for a sort of private data control panel that could show you exactly what was stored where and put the user in control….
We could use that more than ever. Google especially could use that, if it wants to stop the privacy attacks or at least stem them. How about it? I asked Google’s global privacy counsel Peter Fleischer about this yesterday, when talking to him about the Privacy International survey.
“We’re thinking hard internally along the digital dashboard-type of approach. Is there a way to give users a dashboard and visibility to all these elements and give them control,” he said. “It would be hugely complicated to build, but in terms of that vision, I completely share it, and we’re having deep discussions about it.”
Hopefully, we’ll see something like this happen as privacy has finally become a competitive feature. Meanwhile, some additional reading:
- Microsoft Offers Privacy Options for its Search Engine at the New York Times says Yahoo will now anonymize data and do this after 13 months. So far, Yahoo’s not put that out as a formal press release. But sending out that quick promise was enough to get Yahoo, along with Ask, declared as “running the cleanest data-collection shop.” This is despite the fact that NO ONE to date has anonymized anything. We have promises but not implementations. Nor is anonymizing log data alone enough to declare someone “clean.”
- Online Search Privacy Urged from PC World says that Microsoft and Ask plan to get in contact with “the other players” in this space, which I think answers the question about how the other players were ignored beforehand.
- Techmeme has lots of related comment and discussion on the news today.
Postscript: Peter Fleischer, Google’s global privacy counsel, emailed this
I can also confirm that Google learned about this Microsoft/Ask initiative from reading about it in the press. We have publicly said that we’d support a process for further industry dialogue on online privacy issues.
Google also tells me the Privacy International summit was cancelled, with a view that a successful summit would require a sustained format for dialogue amongst industry and other stakeholders. Google told PI that it is happy to participate in a forum for a meaningful exchange of views on privacy issues, and that Peter Fleischer is open to participating in a steering group to explore ideas about how best to do that.
Postscript 2: I’m remiss in not mentioning that meta search engine Ixquick jumped on the “privacy as a search feature” idea early on, announcing in June 2006 that it would delete IP addresses and cookie information from its logs. It explains more here, how information is deleted within 48 hours