Live Blogging: Microsoft Searchification Day 2007
I’m sitting at the Microsoft conference center for Searchification Day. We’re within walking distance from the Googleplex, where I was for Google Searchology Day just a few short months ago. Searchology was all about Google’s unveiling of universal search, and interestingly, today could be seen as Live’s answer to that. As Nathan Buggia, Product Manager puts it, the name “Searchification” is like the name “Searchology,” only better.
Brad Goldberg, Live Search General Manager, opened up the session by explaining that Microsoft is making a new push towards transparency, and today is an example of opening things up and show what’s under the hood. Over 50 press, bloggers, SEMs, and Microsoft partners are here to see how Microsoft has been measuring results and what they’ve done about the issues they’ve found.
Below, live blogging about three major changes to Live Search. Also see the non-live blogging overview that my Search Engine Land colleague Greg Sterling has posted: Microsoft Introduces New Live Search Index, Adds Features In Effort To Close ‘Relevancy Gap’ And Improve User Experience. Now for those three major areas:
- Web search improvements
- Vertical enhancements
- Webmaster tools
Brad says they’ve looked into searcher satisfaction based on things like click through rates, dwell time, how many any queries did searchers refine after clicking on no results, and that type of thing. They found that 40% of queries are unanswered and 50% require refinement. And I think those numbers are right, but I’m live blogging and I have Greg Boser and Todd Friesen on either side of me mocking my serious dedication to the live blogging craft, so I might be getting confused on things that involve math.
Microsoft feels that search moving forward will look very different from search today and that search is becoming more and more a navigational way of getting around the Internet. They have been crawling and indexing many different content types than before and are announcing a four times increase in index size (to more than 20 billion), as well as relevance enhancements and improvements in key verticals (which we’ll hear more about later).
They find that three models are emerging in search:
- Targeted search (searchers looking for a specific, quick answer). These are high frequency, short sessions.
- Discovery to find something. These are searches such as for shopping, which lead searchers to various sites, who navigate in and out of search and browse modes. These searches tend to cover a number of sessions and searchers have a high emotional connection.
- Exploration-type search. These searchers are browsing with no specific goal. These tend to be long, repeated sessions for things like entertainment, gossip, sports, and hobbies.
Microsoft knows they need to step up in search. They note that while they have 69 million searchers in a month (vs. 104 million for Yahoo and 142 million for Google), they have only 11% of queries (vs. 23% for Yahoo and 56% for Google) (see Stats: Popularity here for some numbers). I think these might have been the latest comScore numbers, but the fine print was too hard for me to read without my glasses. I have since found and put on my glasses, so hopefully I’ll have more accurate data for the rest of the sessions.
Satya Nadalla, Corporate VP of the Search and Advertising Platform Group, has come on to talk about the significant relevance improvements they’ve made in a short period of time. He says that they plan to do major updates once or twice a year and do more minor updates monthly. Right now, they’re announcing a major update to both their index size and relevance. Note that he doesn’t mention they’re announcing a major improvement with freshness. I get the feeling this may be the next big area they need to focus on. He says this is a watershed event.
This “fall update,” as he’s calling it, is based significantly on customer feedback. The biggest issue they hear about is relevance. They found that:
- 54% of searchers are fully or partially satisfied
- 46% of searchers are not satisfied
They also found that relevance issues accounted for 91% of the dissatisfaction. Ouch. No wonder they’re focusing on that first. This analysis was based primarily on feedback and information they collected over the web, as well as behavioral analysis. When they analyzed the relevance issues, they found that 25% of the problems were with query intent and refinement, 32% were about general core ranking issues, 28% were index coverage and quality issues, and there were 15% of sundry other problems.
With this update, they are concentrating on six things to address the problem:
- coverage (hence the fourfold six increase)
- query intent enhancements (what are people actually searching for)
- query refinement
- Structured information extraction
- Rich answers
Now he’s talking about coverage. He says they’re proud of their core infrastructure and they’ve done tons behind the scenes so that now they can start working on visible things, like relevance and a dramatically improved index. They’re going to continue to work on their index size, quality, freshness, and filtering. They are looking increasing the depth of the web that they crawl and only deep crawl high-quality sizes that need comprehensive indexing. This is always something search engines have to consider. You have limited crawling resources. Do you crawl a subset of sites comprehensively or a subset of pages from all sites?
With these improvements, they say that the number of queries that have fewer than 10 results have been cut in half. They feel they have a relevance gain in long and obscure queries and are relying more on user-generated content.
He then shows us some searches for which they’ve significantly increased the number of results (for instance, EPRML now has over 14,000 results vs. 1,700). They’ve changed the user interface to help people get to what they need more quickly and have done all kinds of testing for things like font and layout.
They’ve also made improvements with things like stop words, punctuation, acronyms, spaces between words, and plurals. They feel this has improved 30% of the worst results and 10% of overall queries. They’ve so far focused on query alteration to optimize relevance. They point to a query like [the office]. Search engines historically would drop the “the,” but in this case, “the” helps signify that the searcher is looking for the TV show.
Improvements in query refinement (such as autospell correction and related queries) have improved 8% of queries. In some instances, they simply show results for the corrected query, because they have confidence in the searcher’s intent. In some instances, they may show related queries on the right.
He also showed new translation tools, such as a cool new UI that lets you see side by side translation and mouse hovers.
The improved ranking algorithms use neural networking ranking that are loosely modeled from biological neural networks and can learn patterns that simple algorithms can’t. These algorithms can detect things like words pairs and are close to natural language queries (for instance, “what’s the hottest it’s ever been in AZ”). They note that for queries like this, Google returns pages with all of these words in them, but Live can now return better results because it can understand the relationship between words. [NOTE: This new Microsoft patent on ranking found by Bill Slawski is also interesting.
Now they’re talking about structured information extraction and how they can take unstructured web data and structure it for things like people search. They can recognize classes of data and provide deep vertical experiences for things like shopping and images. They have improved coverage of instant answers (5% of queries by summer and 10% of queries by fall) and are investing in the top traffic verticals (health, shopping, celebrities, and local). They say they’ve increased the number of Encarta facts they display, have dramatically improved coverage of news queries, and are now showing videos online.
They have improved relevance for long tail queries, and they feel they are now equal to Google and slightly better than Yahoo in relevance. How can they tell? Well, that’s a good question. If you have a good way to measure, they’d like to hear from you. They use human judges, blind judging, positional judging (the first result is more important than the second), scaled judging (perfect results are worth more than excellent results), and ongoing measurements.
So, they’ve improved coverage and relevance. Now they’re going to show us how they’ve incorporated that into key verticals.
Key Vertical Growth
So now we’re learning about Microsoft’s version on universal search. They’ve initially looked at high interest (to searchers), high value (to advertisers) verticals.
Maps and local were doing well, but they felt they could improve. 32% of queries have local intent, 67% of local searches start at a major search engine, and 42% of searchers use maps and directions. (Yes, I did get some of those numbers from Todd, but they switched the slide too quickly!)
For maps, they looked to make things as simple and intuitive as possible. They’ve combined expert sources and user-generated content, since sometimes the locals know more than the experts. Sadly, much of the cool stuff they showed us (like awesome flyovers in Virtual Earth) won’t be available until next month. With the Virtual Earth changes, which involve lots of cool 3D stuff, they are betting on longer term paradigm shift in the industry. Will adding a third dimension fundamentally change how people interact online? Maybe. They think so, but it remains to be seen.
What about mobile? They have 2.8 billion mobile users today and are partnering with carriers so that they an provide location-aware information to the searcher. They also are working on voice-activated search using their TellMe partnership.
Some of the overall search improvements in local include better interpretation of user intent, stemming, and refinement. They’re definitely excited about their new driving directions. It’s pretty cool, actually. With one-click directions, you can instantly get directions from major points. Is it really that much harder to type in your starting address? Well, maybe yes, for the lazy among us. Not that I am one of those.
They’re also incorporating traffic and providing rerouting based on that, which is potentially pretty cool, particularly over a mobile device.
Another vertical they are focusing on is entertainment. People are super passionate about their Britney Spears, and up to 10% of queries are entertainment-related. There’s low satisfaction for these types of queries because people just want more!
They’re launching this whole new xRank celebrity ranking so you can see the movers and shakers. Is Paris Hilton more popular this very moment than Britney? Live Search will tell you. They’re doing all kinds of stuff with celebrities, and if it’s really fresh data, it may actually be pretty cool for those of us who are about gossip of the stars. Not that I know anyone like that.
What about video? They’ve got a whole new proprietary video search engine that searches over all the video on the web. A “smart preview” feature lets you play the most relevant parts of the video inline. This is actually pretty cool if it works. The video will fade in and out at the crucial parts so you can know before clicking if it’s what you want to see.
And now we get to shopping. 86% of searchers use a search engine for product or shopping information. 70% of product-related queries are category queries, such as [digital camera]. They made key investments in product answers and user reviews. They have over 37 million products and have things like guides, images, specs, and reviews broken down by component.
Health is the other big vertical they’ve been focusing on. They found that health searches are different than other types of searches. For instance, privacy becomes more important, so they are only storing log and cookie data for these types of searches for 90 days and do no behavioral targeting. They are using trusted sources, such as the Mayo Clinic, and are working to make queries actionable with items like articles and calculators.
The top three health-related searches are:
They declined to show the web results for [sex]. I just don’t know why. But they showed the health results, and those did indeed seem health-related, but the web results would have spiced up our morning that has been devoid of caffeine (why can’t we bring drinks into the conference room? Why do I have to smuggle in a Diet Coke? Not that I would). In any case, these health results do seem pretty cool, if indeed the coverage is comprehensive (which I’ll have to check out later). They say they crawl health journals nightly to get the latest information, and I’m assuming this must be with partner sites.
In conclusion, they note that there are 180 million Internet searchers, and 70 million of those have searched on Live Search in the last month. But these searchers don’t do many searches on Live and, in fact, they have the largest gap of any search engines of share of searches vs. share of voice. 500 millions users are on MSN, but only a small percentage use Live Search. If they can get the 70 million searchers to just search one more time a month on Live Search and can get more of the MSN users to try Live, they feel that’s all they need to really gain momentum.
You can check out the changes here. The core changes are coming soon (some are live even now) and the rest is coming within the month. The true test will be comprehensiveness. It looks great for the queries we saw, but how does it do for other queries? If these changes are truly comprehensive, they may indeed have something to build on.
Next we learned about the new webmaster tools they are launching in November. I have a lot I’d like to say about this beyond live blogging, so look for more thoughts from me on this. Mostly, I think it’s awesome that all the engines are stepping up with such great support for webmasters. Nathan Buggia announced both the tools and a whole new team to support it. The webmaster portal includes not only the tools, but a blog, discussion forum, and help center as well. Todd, in his live blogging from the event, has the rollout schedule: private beta launches next Monday, with the public beta to happen on November 15, launched at SMX London.
The presentation included a slide with Buffy, so as you can imagine, it was amazingly kick ass. The initial version of the tools include a robots.txt validator, information on what pages of the site are most important to Live Search, what pages link to and from the site (ranked in order of importance), and when the site was last crawled. They say they’ll tell you if they find you’re spamming, although we didn’t get any details on this. You can also submit a sitemap or ping Microsoft when the sitemap has changed. I asked if this means that the crawling and indexing infrastructure now consumes all sitemaps, and they said that while they are now consuming more, it’s still not 100%. They’re getting there, though.
They have a meta tag or HTML file verification that is similar to Google and Yahoo. They are hoping to initially launch in English, Chinese, and Japanese. They kept reiterating that this was just version one and their new team has lots of plans for the future. It brings a happy tear to my eye that the industry continues to move in this direction.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.