How Search & Social Engines Are Using Semantic Search
The term “Semantic Search” is certainly not new. However, it has taken on a new dimension and implications in both search and social engines today. In addition, it has had a strong impact on targeted semantic advertising. This special series of forthcoming articles on semantic search will take a look at the history behind the […]
The term “Semantic Search” is certainly not new. However, it has taken on a new dimension and implications in both search and social engines today. In addition, it has had a strong impact on targeted semantic advertising.
This special series of forthcoming articles on semantic search will take a look at the history behind the development of semantic technology and why it has now become so commercially viable and topical. It will also take a look at how the technology enables “answer engines,” rather than simple search engines, to improve the user experience.
For example, look at the direct answer to a query for [Barack obama birthday] in Google.
According to announcements on Google’s Inside Search blog, this is only the beginning of building an “artificial intelligence” engine, or its “Star Trek” computer.”
Of note was the comment posted at the end of Google’s blog announcing the Knowledge Graph by Amit SInghal:
“We’re proud of our first baby step—the Knowledge Graph—which will enable us to make search more intelligent, moving us closer to the “Star Trek computer” that I’ve always dreamt of building.”
Many Artificial intelligence, NLP (Natural Language Processing), or machine learning technologies may be cited as “semantic technologies.” Semantic means “meaning,” so semantic technologies in general do not merely include semantic search.
However, many of them can be leveraged to improve search, such as semantically targeted advertising, automated topic recognition and more. Many semantic technologies map to underlying ontologies, which are thought of simply as vocabularies or lexicons.
For the purpose of “Semantic SEO,” we will refer primarily to the concepts associated with the Semantic Web and the adoption thereof by Google and the other major search and social engines. I’m referring to the ontologies or vocabularies being used − which encompasses the semantics or concepts − and the syntax defined in metadata with on page structured markup.
Semantic Search, as it is used in current parlance is essentially the notion of using or exploiting metadata to improve search on documents. In the case of search engines, it more explicitly refers to embedding metadata in HTML5 (using semantic markup, the formats or HTML5 syntax currently supported by the search engines: RDFa Lite and microdata).
How Is Metadata Exploited By Search Engines?
One example is that of enhanced displays in the SERPs − Google’s Rich Snippets, Bing Tiles or Yahoo SearchMonkey. Enhanced displays also provide more visually engaging displays and interfaces with a corresponding increase in CTR.
Another aspect of exploiting this information for search engines is to search directly on this consumed metadata – examples would include Sindice.com or Google with the Knowledge Graph and the Knowledge Carousel.
This is a large part of the evolution of search engines from producing a series of probabilistic results or “blue links” to becoming “answer engines.” Users definitely find it tiresome running multiple queries to obtain (or not) an answer to a query. Relevancy in answer to a query is everything, and there are multiple ways semantic technologies can be leveraged to ultimately attain that goal.
Machine learning techniques can also be used to ensure/improve topic validation. A timeline of Semantic Web adoption is shown below:
- Yahoo! opens Search Monkey: February 2008
- Bing acquires Powerset: July 2008
- Google introduces reviews and aggregate reviews using rich snippets: May 2009
- Google introduces specifying an image’s license using RDFa: August 2009
- Google introduces RDFa support for videos: September 2009
- Google encourages webmasters to “help us make the web better” by using rich snippets: October 2009
- Google announces use of structured data to describe an organization: March 2010
- Google announces rich snippets for recipes: April 2010
- Google announces rich snippets go international: April 2010
- Facebook announces open graph protocol based on RDFa: April 2010
- Google acquires MetaWeb: July 2010
- Google Refine is announced: November 2010
- Google announces rich snippets for shopping sites: November 2010
- Google, Yahoo, and Bing announce Schema.org: June 2011
Semantic Technology Helps Provide Relevant Answers
It is at this point that the three major search engines got together and announced support for schema.org, not only diverging from previous standards of markup (namely RDFa) and initially announcing support for microdata only (subsequently changed due to uproar in the Semantic Web community), as well as the logical acceptance on the part of the search engines to consume information that converts to high standards, data quality and standards body definitions.
It has long been the intent of any search engine to be able to provide not just a series of links, but actual relevant answers. These possible answers can be derived by leveraging the above mentioned mechanisms.
Determining user intent is yet another means of exploiting semantic technology. It can be done by:
- Correctly interpreting a portion of the query, or the query in its entirety
- Providing a “best guess” at an answer by reasoning directly with previously validated information from highly trusted sources
- Aggregating and growing this knowledge base or Web of data or graph database by adding consumed information and/or reasoning about it.
Producing or publishing this information in the form of embedded metadata in, say, HTML5 can be accomplished via adding microdata or RDFa Lite as defined in the Google blogs and other blogs. However, these are merely syntaxes that can be consumed or understood by the search engines and are HTML5 compatible.
The other issue is that of vocabularies (or ontologies or taxonomies). Since standards are always an advantage in many arenas, the three search engines − Google, Bing and Yahoo − agreed to mandate a standard vocabulary or ontology, that of schema.org, announced June 2, 2011. Search engines have such a large user base that they actually have the power to mandate the ontologies or vocabularies to be used.
The Semantic Web community has many other defined ontologies/vocabularies and provides them as open source (GoodRelations for e-commerce, FOAF SIOC, Wordnet, DBpedia – derived from Wikipedia and more).
Schema.rdfs.org has a great set of resources for those of you wanting to get started as there are tutorials, software and tools to generate structured markup automatically, and more.
Let’s continue the timeline of Semantic Web adoption since the schema.org announcement.
More music formats: 08/2011
Microdata and sports stats from NFL: 8/22/11
Upcoming concerts: 2012/02
Search Engines Are Becoming Answer Engines
This timeline culminates at the time of this writing with introduction of the Knowledge Graph and the Knowledge Carousel in May 2012.
The Knowledge Graph is a direct extension of Freebase and is extended by other consumed information from “structured markup” as defined in schema.org or as deemed relevant by Google.
The Knowledge Graph itself is depicted on the right hand side of the query and is a further example of Google moving its queries to those of an “answer engine” from fact based or aggregated information.
In June 2012, Twitter announced its “Twitter Cards,” a way to “attach media experiences to Tweets that link to your content.” You can read more about this in a post on semanticweb.com.
In July 2012, Google Webmaster Central introduced the Structured Data Dashboard for Webmasters, allowing them to see consumed structured data. Further descriptive comments indicating Google’s commitment to this course of action could be interpreted along with the post.
“Structured data is becoming an increasingly important part of the web ecosystem. Google makes use of structured data in a number of ways including rich snippets which allow websites to highlight specific types of content in search results. Websites participate by marking up their content using industry-standard formats and schemas.”
Shown below is a site level view of about 2 million annotations for schema.org books:
Note the depiction of results for the band “Coldplay.” Rich snippets markup for schema.org (music, etc.) is clearly integrated into this display.
Below is a depiction of the Knowledge Graph combined with features of the Knowledge Carousel, namely the scroll bar on the top. The Knowledge Graph is extended from simply Freebase and other linked data sources via validated verified pages and trusted sources containing structured markup as per Semantic Web related techniques.
The query entered for the display below was “Tom Cruise Movies.” The Knowledge Carousel is globally available in English as of September 2012.
The example above is certainly indicative of how these enhanced displays consume SERP real estate.
CTR Increase With Semantic Markup
Another crucial aspect of Semantic SEO or schema is the increase in CTR for marked up items, and the incredible increase of screen real estate utilized by the Knowledge Graph/Carousel and Rich Snippets and other information aggregated by Google (like places and events on the RHS of Google, where the Knowledge Graph results typically display).
Example shown below:
As an indication of where this is going, it is worth taking a look at Google insights for search and seeing the resultant graph. I typed in the term [Semantic Search] and the results can be seen below.
Semantic Technology Adoption Expands
It is interesting to see what terms emerge as the most searched. Also, the peak in the graph below illustrates the impact of WalMart’s Semantic Search Engine resulting in a 10 to 15 percent boost in business.
Another point of relevance is the mapping of higher order terms in schema.org to the verticals in Google, Bing and other engines. When a user selects a specific vertical in the image below, it is clearly far easier to determine intent in a query.
Feel free to compare these to the full set on the official schema.org site. As an experiment, I actually loaded the schema (owl version) from the official schema.org site.
Using a tool called Protégé, and looking at the resulting hierarchical display, it actually gives a great graphical depiction of this. I focused on expanding “place,” but you can select any option you choose:
Semantic SEO Benefits
In summary, Semantic SEO and Semantic Technologies bring many strong benefits to the search engines.
- Enhancement of visual displays (Rich Snippets) in SERPs
- Search engines to search directly on that data for relevant answers to obtain more relevant results
- Classifiers and other machine learning mechanisms can be used by search engines to verify topic information on pages
- Assist in determining user intent (context improves recall/relevancy)
Future articles in this series will dive into the specific verticals in greater depth, clarifying in more detail how vertical search improves relevancy and defines user intent, taking a look at semantic technologies used in recommendation engines, semantic advertising and more!
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.