I’ve said a number of times that the holy grail of search is to disambiguate intent. A two or three word query, given the complexity of the English language, is just not enough to confidently provide results that will always be relevant and useful. But up to now, it’s been all we’ve had. Today, however, the search arms race is to add another layer of disambiguation on top of the keyword query. And there are a number of ways to do this.
Google is experimenting with past click stream data and task context, believing what you’re doing now and what you’ve done in the past should be an accurate indicator of what you might be looking for in the future. Others, including Ask and Yahoo, are taking a less granular approach, looking at the ebbs and flows of the social graph of the Net. The bet here is that by looking at what clusters of people do, you can deliver the best result to an individual that hopefully shares their interest. Finally, you have those that believe something as finely nuanced as human intent and language can’t be trusted to an algorithm. It takes a human to understand a human, and so you have engines like ChaCha and Mahalo.
Over the next two columns, I’d like to look at two different approaches to disambiguation. The first is Baynote, a recommendation engine that is similar to what’s under the hood at Amazon. In the next column, I’ll look at VortexDNA, a unique user plug in that uses our personal values as a guide to online content.
I had the chance to talk to Mike Svatek, the Marketing Director at Baynote. I started by asking what exactly Baynote does.
Mike: Our goal is to improve the effectiveness of websites through either search, or navigation, or content discovery. When we talk about search, I just would expand that a little bit to talk about content consumption and content discovery in general. I think probably most of your readers will know about Amazon, and they are generating 30% or 35% of their revenue through recommendations. I think people are pretty familiar with the collaborative filtering paradigm.
Baynote’s philosophy is that every person has a myriad of different roles they play in life, you can be a father, you can be a son, you can be a golfer, a teacher, you could be a brother, you can be a sports’ car enthusiast; there is all these roles and interests you have in life.
Mike talks about the problem with some of the approaches to disambiguation that try to understand you as an individual based on past behaviors.
Mike: What a system seeks to do is try to profile you as an individual and figure out hey, as an individual I know that you will like X, Y, and Z. We didn’t find that to be a very effective approach, instead we said people are animals of context and it’s their mental context at the time that drives what they should be seeing. We actually prefer to not look at an extensive set of prior behaviors and in fact look at the current attitude, the current mindset of the person and then show them content and products and search results that really are mapped to that intent.
This gets around the problem that Amazon had in some of their earlier implementations, where people who did their shopping based on different roles (personal vs workplace) were given recommendations not appropriate to their intent in their current role. Mike gives an example:
In my case I do all of my shopping on Amazon for Christmas. And so, at any given time I bought a set of golf clubs or golf accessories for my dad, I bought a purse for my niece, something nice for my wife, maybe something for me; and so I’m actually in four or five different roles all at once. Trying to target me will be a challenge. If it is the end of April and you are still showing me Hello Kitty purses, that’s not exactly top of mind for me right now.
Which makes sense. But my next question for Mike had to do with the more subtle aspects of disambiguation. For example, right now I’ve been doing a lot of reading about cognitive psychology, but I prefer books that deal with the subject at a laymen’s level. Would Baynote be smart enough to pick that up through its approach?
Mike: What I want to find is a book on cognitive psychology, and I’m going to rely on the dozens or hundreds of people that have come before me, who express the same intent, and rely on the places that they gravitated toward, the books that they compared or shopped, and then maybe even potentially books that they added to their card and purchased. And, as long as they are still in the context of cognitive psychology, even if they had moved away from the search results etc., they are still demonstrating behaviors and intent that is similar to other people who have expressed that intent before. Then, even if the terms cognitive psychology doesn’t even appear within a book, I have enough evidence to suggest that when people are looking for that or expressing that intent, they are buying these books that you would also like it.
So, although Baynote doesn’t know anything about me as an individual, it believes that the “wisdom of crowds” will prevail and will point me in the right direction.
Another question for Mike was if the recommendations that Baynote provides updates itself in real time. For example, if I go to a category page, then do some more browsing in other parts of the site, and then return to the category page, would the recommendations change based on the actions I just took. Mike pointed out that sometimes navigational behavior cannot be that valuable in determining intent.
Mike: If I was to go and navigate all around the site and then come back to the same page, through testing we’ve found that if we provide the same set of recommendations within that same sessions, it’s actually more effective. One might think that hey, offering different suggestion after people navigate around might actually provide more tailored recommendations in some way. But the problem that we see with that is that navigation through sites is a larger function of site design and site structure, than it is the actual intent of the user. That’s not true in all cases, but in the cases where our customers use us and through our research, that’s what we find. So you can imagine someone that has gravitated to particular area, they see a set of recommendations; and now because of the site structure, because of the way that they are querying in the engine, they are led to another part of the site. Now, if you take that as a strong indicator that they actually want a different set of information and then come back to the original book, and the whole reason they went out there was because they were misguided, then you have got that information in the system.
Of course, the problem that Baynote attempts to solve is the dismal state of most site search. It just doesn’t function very well, either as a navigation option or a discovery tool
Mike: Site search is really just broken. And, if you look at why it’s broken, I think it’s that the fundamental capabilities of search engines or the fundamental ways that search engines work. I have hundreds of documents in the database, let me clear crawl those, index those; and when you search for a term on a pharmaceutical term, for example, I mean to bring back every single document that matches that term, so my recall is very high.
The challenge there is that we are asking a different question these days. We are not asking, “Bring me back every document that matches this term.” Instead we are saying, “This is what I’m interested in, answer my question.” I think it’s fundamentally different, and so when you are in that space, it no longer really is that important, how well a particular term matches within a document, it’s more important to understand, if I had an infinite numbers of experts by my side, and I could ask all those experts to answer to my question, let’s have them agree on the right answer and tell me. There is no collaborative sort of socially reinforced model that site search engines have really introduced over the past many years. But the example of that would probably be explicit feedback mechanisms.
But if you only use that as your feedback mechanism, you get several different types of people that will participate, people that have too much time on their hands, or people that have a very skewed non-representative viewpoint, either they are very sour or they are very excited. They’re not the majority or they have an ulterior motive, there is a reason they are trying to gain the system for some reason.
We said okay, the current approach is broken, we are not going to try to improve those methodologies. Instead we ask how do people find information now. It’s the same way we do in the real life. If I need to find a doctor, I go to talk to someone that I trust. And so, how do we start to build trust implicitly within the web community? First of all, not by asking anybody what their opinions are. But instead, just understanding where people go to get value on the web and understanding the context under which they are getting that value.
So, Baynote is using community patterns and the social graph to act as their foundation for recommendation. The problem with that is that you need enough traffic to define the patterns. Once you move into the Long Tail, there’s not enough data to provide the strong signals you need to confidently make recommendations. I asked Mike about this point.
Mike: You need sufficient feedback to know that the model is working correctly. I couldn’t agree more about your point about the long tail. Certainly in the head you are going to get plenty of feedback and have those results validated very quickly. But in the tail, the challenge there is if what you look at, I actually think it gets down of more to the fidelity of what you are listening. What I think that Baynote has done is pretty unique. We have got a basically an approach that we call full spectrum finger printing. And so, the fact that someone clicks through on a search result, so let’s take a super long tail query, where you only get maybe two or three of these a month. If you query that, you get a set of results; and I click through the first result, and the second result, and the third result and instead of counting those as successes, we have a much higher bar. There is a significant amount of behavioral characteristics you have to satisfy once you are on that actual page to tell us that you are getting value from it. And that knocks out a lot of the noise and it allows us very quickly, with a very, very few number of queries in that long tail, to be able to appropriately surface that long tail content
I asked Mike what was the biggest challenge Baynote was looking to solve in the future:
Mike: I think we are just starting to scratch the surface in terms of what we can do for public facing websites to make them more usable and more effective, and ultimately give the managers of those sites a tool that, I think with fairly low effort, can actually make a huge change in their metrics, their KPIs, whether it is fewer inbound calls, whether it is for support, or whether it is higher revenues on ecommerce or more page use for the media. And so, I think from our perspective, we are pretty excited by the fact that there is so much new technology available on websites, and through some of the stuff we are doing as well, to be able to capture very, very rich behavioral and contextual fingerprint of that site, and then be able to take learning’s that we learn from public sites or consumer sites and start to mix findings and mix, not data and not behaviors, but distill the core elements that made a consumer facing site so effective and bring that into the enterprise, and take that on to a public enterprise website, and maybe move that over to retail if media becomes accessible to a certain model.
It was notable that Baynote has purposely avoided the individual history click stream tracking that seems to be integral to Google’s approach to personalization. I had to ask Mike’s opinion on Google’s take.
Mike: I’d be leery of any approach that uses personalization, and so the idea of trying to profile a web searcher, a web visitor, and tweak or tune the results based on this person’s past query behavior and managing that profile. Because of the nature of people; we are different context and that we are constantly in different roles. We know that the profiling approach is very tenuous in that situation and frequently doesn’t perform like you might expect. I wouldn’t bet on that approach at all. I would bet on an approach maybe more like a Wikia approach that looks at not so much personalized, but based on users intent, a generic intent that someone expresses.
Finally, Mike leaves us with a suggestion for our reading list, Emergence by Steven Johnson
Mike: It basically talks a lot about how optimal solutions form, both in human society as well as in nature’s own systems like bees and ants and so forth. How each individual actor acting in their own self interest, given an environment that they are operating in; either that system will die immediately or it will survive and optimize itself like a beehive, or like ant colony or the formation of cities. Why these cities form the way they do, why the certain districts form the way they do? And so, that’s a concept that we believe, by really understanding and let the system or let the collective community work the way that they naturally work, and by not influencing that explicitly, but letting the implicit behavior just emerge, and watching that and seeing; and then sort of floating that up to make it more visible to other users. I really think it’s exciting for me to see the core concepts in those books, and the wisdoms of crowds ultimately gets surfaced in websites.
That covers Baynote’s approach to disambiguation. Next column I’ll look at how VortexDNA uses personal values to better match you with relevant content.
Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.