Kick off each Monday with the best news and ideas in social media.
Reddit AMA Reveals Graph Search Has Been In The Works Since Early 2011 & Specs On How It Works
Last Friday, Lars Rasmussen, Facebook’s Director of Engineering, did an AMA (ask me anything) on Reddit to discuss the ins-and-outs of Graph Search. Lars answered 20 questions from fellow Redditors shedding some light on the history, importance and technology behind Graph Search.
According to Lars, the initial Graph Search was internally launched in Summer of 2011, but the project was conceptualized in Spring 2011, on a walk with Mark Zuckerberg. The initial project was bare bones, and the internal beta for Facebook employees wasn’t released until Fall of last year.
The technology stack used in Graph Search is in inverted-index system titled ‘Unicorn.” Nodes are selected within Unicorn based on edge-relationships to other nodes according to Rassmussen. Search indices are build on Hadoop and only span to the level of Friends, but “Friends -of-Friends” searching is possible. Here are his detailed answers on the technology stack and query languages in Graph Search.
In answering a question about beta length and slow rollouts, Rassmussen stated that one of the reasons that Graph Search rolled out so slowly was due to privacy. He stated:
This is one of the reasons we are rolling Graph Search out slowly. Back in December we launched a series of tools for everyone to more easily audit their own content, and content others have shared about them. We want to give everyone a chance to use those tools before rolling our Graph Search more broadly.
Getting privacy right for Graph Search was an enormous effort in itself and we take people’s concern and feedback on the matter very seriously.
The biggest challenges? Rasmussen stated that scaling the search index, building the natural language parser and finding the optimal order of results were among the most troubling tasks. He also conceded that Graph Search was still “a bit imprecise with tenses” and hinted that more precise interpretation may be coming soon.
One Redditor asked Lars for the EIL5 version (Explain It Like I’m 5). Well, Lars was a smarter 5th grader than I. His simple version of how Graph Search works is as follows:
At the core of the system sits a Context-Free Grammar (http://en.wikipedia.org/wiki/Context-free_grammar) describing all the queries the system can understand. The grammar in general contains many different ways of expressing the same question.
As a user types in the search field, a Parser (http://en.wikipedia.org/wiki/Parsing) attempts to find the queries from the grammar that most closely matches what the user has typed, and displays those as suggestions in a drop-down below the search field.
Part of the parsing involves searching for people and entities. For example, if I search for ‘photos of jane doe’ the parser needs to figure out which Jane Doe I am looking for. When in doubt, we tend of course to pick the Jane who has the most friends in common with me, went to my school, works for my employer, etc. This part of the parser is essentially Facebook’s existing ‘typeahead’ search system.
When the user clicks one of the suggested queries, we proceed to resolve the corresponding semantic (see my answer). There are three steps to this part: 1) we retrive (sic) candidate answers from an inverted index (http://en.wikipedia.org/wiki/Inverted_index), then we 2) filter out anything the searcher does not have access to, and finally we 3) order them according to a great many criteria in the way we think is most interesting to the searcher.
Lastly, we display the results. You’ll note that we take great care to list by each result why we think they are a good result. For example, of you ask for ‘Friends of Facebook employees’ we might place a snippet of text like ‘Friends with Mark Zuckerberg and other Facebook employees’ next to a result. ‘Other Facebook Employees’ is typically a link that we issue a new query for all the Facebook employees who are friends with that given result. (I looooove these snippets and consider them the unsung heroes of Graph Search :)
Though Lars did share quite about on the process, technology and importance of Graph Search, he failed to answer some questions that marketers were closely watching. Some of the noticeable non-answered questions with numerous upvotes included:
- What advertising opportunities will be available around Graph Search?
- When can we expect our off Facebook likes to be included in Graph Search?
- Which parts of Facebooks EdgeRank algorithm were most important when building Graph Search in order to get relevant search results?
- How long will we have to wait for an API? How are the results ranked? By affinity, by clicks? Will Graph Search ever support searching for posts?
Head over to Reddit to view the entire AMA if you’re looking for more.