Future SEO: Understanding Entity Search

Last month, I asked you to imagine the future of SEO with a focus on Entity Optimization as I interviewed veteran semantic strategist Barbara Starr. We discussed an “answer engine” that uses relevant, machine-recognizable “entities” on Web pages to answer specific, well-refined queries.

The Hummingbird Update

On September 26, Google took another step toward becoming that answer engine with its Hummingbird update. In Danny Sullivan‘s live blog about the Hummingbird algorithm, he explains how Google is rapidly adopting semantic Web technology while still retaining parts of its old algorithm. This is Google’s solution for evolving from text links to answers. Such a system will display more precise results faster, as it’s based on semantic technology focused on user intent rather than on search terms.

To review Google’s progress in this direction: first came the Knowledge Graph, then Voice Search and Google Now — all providing answers, and sometimes even anticipating the questions. To serve these answers, Google relies on entities rather than keywords.

What Is An “Entity”?

For the purpose of this article, entities are people, places or things. One way of introducing entities is to recognize that Google’s Knowledge Graph is an entity graph and represents Google’s first step toward utilizing semantic search (or entity search).

What is “entity search”? Let’s keep it simple — it’s basically a more accurate method for bots to understand user intent while mapping additional verified sources to answer a search query.


Unstructured Vs. Structured Data

Over the past two decades, the Internet, search engines, and Web users have had to deal with unstructured data, which is essentially any data that has not been organized or classified according to any sort of pre-defined data model. Thus, search engines were able to identify patterns within webpages (keywords) but were not really able to attach meaning to those pages.

Semantic Search provides a method for classifying the data by labeling each piece of information as an entity — this is referred to as structured data. Consider retail product data, which contains enormous amounts of unstructured information. Structured data enables retailers and manufacturers to provide extremely granular and accurate product data for search engines (machines/bots) to consume, understand, classify and link together as a string of verified information.

Semantic or entity search will optimize much more than just retail product data. Take a look at Schema.org’s schema types – these schemas represent the technical language required to create a structured Web of data (entities with unique identifiers) — and this becomes machine-readable. Machine-readable structured data is disambiguated and more reliable; it can be cross-verified when compared with other sources of linked entity data (unique identifiers) on the Web.

Structured Data, Triples & Triplestores

Semantic search uses a vocabulary like Facebook’s Open Graph protocol or a syntax like RDFa or microdata to create structured data. Structured data can be imported and exported from triplestores. Hang in there and bear with me for a minute…

A triplestore is a database for the storage and retrieval of triples. Triplestores are optimized for the storage and retrieval of triples; they can store billions of triples.

What’s a triple? To simplify, let’s break down a sentence: the combination of three parts of speech which form any sentence include a Subject, Predicate and Object — also referred to by semantic strategists as a Triple. Triples are essentially linked entities composed of subject-predicate-object. The subject is the person/thing that carries out the action of the verb. The predicate is the action the subject takes. The object is the person/thing upon which the action is carried out.

Simple example of a triple: Mrs. Keller is teaching Algebra.

Mrs. Keller → subject → an entity

Algebra → object → an entity

is teaching → predicate or relationship → links the entities

Triples are expressed as Uniform Resource Identifiers (URIs). Answer engines will retrieve very specific data from large databases of triplestores storing billions of triples, and linking billions of subjects, objects, and predicates to form relationships. The result is more accurate answers to our queries by internally verifying validated data and relationships that link to trusted documents (structured data).

From Links To Answers

When we expand this logic and technology into a structured Web of data using schema Types that machines like Google, Bing and Yahoo! can understand, we have a machine like IBM’s Watson computer — an answer engine that answers our questions without using keywords or anchor text links.

Structured data creates the ability to provide detailed information about the meaning of your page content to search engines in a way that is easily processed and presented to users.

Understanding Vs. Indexing Data

Let’s circle back quickly to the question: what is an entity? Entities in Google’s Knowledge Graph are semantic data objects (schema types), each with a unique identifier. They are a collection of properties based on the attributes of the real world topics they represent, and they are also links representing the topic and its relationship to other entities.

When Google purchased Metaweb in July 2010, the Freebase database had 12 million entities. As of June 2012, Google’s Knowledge Graph was tracking 500 million entities and over 3.5 billion relationships between those entities. I imagine this has grown significantly over the last 16 months.

Adding machine-readable structured data to the Web will significantly improve a search engine’s capability to “understand” vs. “index” data, and it will provide two big breakthroughs for getting accurate answers to our questions when using an answer engine (or search engine):

  1. Machines will have a much better method for understanding user intent
  2. Machines will be able to draw from very large databases of structured data to match up the most reliable and accurate answer for the user i.e., verified structured data

Indexing Keywords Vs. Natural Language Understanding

SEO professionals, semantic strategists, and search engines are all in a transitional phase — from “assisting websites to get their unstructured data indexed” to ”assisting websites by providing machine-readable structured data on the Web.”

Entity Extraction

To provide an example and dig a little deeper, the image below gives you a limited view of the Schema Type hierarchy for a “Place” and all its variations, e.g., Courthouse vs. Embassy vs. Apartment Complex or Canal.  You will quickly discover that entity extraction essentially powers semantic search. Therefore, an entity represents the future of search visibility! And that includes authority, trust, findability, ranking and so forth.

schema.org Place hierarchy

Schema.org “Place” hierarchy from Protégé

The semantic community, academic community, W3C, information scientists, Google, Bing, Yahoo!, astute enterprise websites, SEO professionals, Web developers, Web designers, Interactive agencies and many others have already begun to improve semantic search by building tools using semantic technology and implementing semantic markup on the Web.

Making Your Business Data & Content Visible With Semantic Markup

I’ve said this dozens of times over the last 3 years: by using semantic markup, ALL your business data and digital content become easily accessible to search engines.

Business data consists of rich media video content, product reviews and ratings, location and contact information, business specialty details, special offers, product information and the list goes on…. Again, I recommend taking a look at Schema.org’s schema types.

Implementing semantic markup on your site will make your business data  machine-readable to search engines, Web applications, in-car navigation systems, tablets, mobile devices, Apple maps, SIRI, Yelp maps, Linked Open Data , etc.

Semantic markup presents your business data as chocolate to the search engines — they love it and eat it up! Search engines understand it thoroughly and know how to aggregate the data for a better user experience in their SERPs. While search engines use structured data to display more relevant search results, you benefit because it’s known to boost CTR.

Final Thoughts

As semantic search becomes more widely adopted, the use of semantic markup allows you to give Google the data entity information it needs for its Knowledge Graph, which in turn provides better answers to user queries on various devices. In the meantime, you can continue the focus on keywords as semantic markup adoption increases. However, prepare for future SERP visibility by understanding and embracing semantic search as you become proficient at using correct structured markup.

The handwriting is on the wall: search engines want machine-readable content to provide more precise answers to user queries. Users want personalized answers at their fingertips as they favor smartphones/tablets over desktops/laptops (Monetate Q1 2013 Ecommerce Quarterly).

This makes it imperative for SEOs to understand semantic technology and entity search concepts. To get started, see Barbara Starr’s “10 Reasons Why Search Is In Vogue” for a list of 10 things you can do now.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: All Things SEO Column | Channel: SEO | Google: Hummingbird | Google: Knowledge Graph | How To: SEO | Schema.org | Search Engines: Answer Search Engines


About The Author: is Managing Partner at PB Communications LLC. Specializing in SaaS solutions for Enterprise Store Locator/Finders, Semantic/Organic/Local/Mobile and SEO Diagnostic Audits for increasing online and in-store foot traffic.

Connect with the author via: Email | Twitter | Google+ | LinkedIn


Get all the top search stories emailed daily!  


Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • Pat Grady

    Given the complexity and pace of change, pity the newbies. More logic and semantics courses are needed.

  • http://www.paulbruemmer.com/ Paul Bruemmer

    Qualified training courses in Semantic SEO may evolve from the Semantic Computer Consortium (SCC). UC-Irvine, UC-Los Angeles, and UC-San Diego will be developing semantic technologies to facilitate the transition of the Internet into Web 3.0.

  • http://www.baratilla.com/ Jay Baratilla

    Does this mean, training your content manager to apply simple subject-predicate-object semantic structures to webpage text content could help your website survive the evolving semantic search? Also, should we be using structured data to all types of our content in our website?

  • Rajesh_magar

    I’ve never read anything interesting and fully informational post like this from long time. And the concept you describe is truly revolutionary and must need to adopt by internet marketer to survive in future marketing strategy.

    Thanks again and would love to hear more on this topic in future.

  • http://www.cygnet-infotech.com/ Boni Satani

    awesome. Thanks for the insights

  • http://www.paulbruemmer.com/ Paul Bruemmer

    We don’t think it will be simple. The purpose of
    this article was to introduce and simplify very complex changes to
    search and answer engines ongoing now and in the future. Training will
    certainly help practitioners make the transition as the semantic web and Internet evolve. We believe that using structured data for all schema types (all
    digital assets) will become standard operating procedure. Therefore, we
    recommend getting started now vs. later.

  • http://www.otriadmarketing.com/ Christopher Skyi

    This is clearly the future of search for information that has semantic structure defined within a limited set of structured data templates (i.e., schema.org) — but

    . . . not all human knowledge, and therefore not all possible content, can be represented by semantic mark up, i.e., we’ll never be able to breakdown all possible content into explicit semantic objects because human knowledge is too vast — and even if we could enlarge something like schema.org to cover all the possible things that we “know,” there’s a far larger problem — human brains create new knowledge, new semantic structures (i.e., relationships) on the fly.

    This is why existing AI programs designed to “understand” some aspect of the world have never been as smart or knowledgeable about the world as we are, in large part because we keep inventing new knowledge about the world.

    Semantic objects along with “page rank” will make Google and Bing incrementally more useful to searchers but the enlarged future of search will include something like IBM’s “Watson” which can deal with vastly more complicated knowledge structures than (at least existing) structured data templates can.

    For more, see this Wired article “Google in Jeopardy: What If IBM’s Watson Dethroned the King of Search?” http://www.wired.com/opinion/2013/10/google-in-jeopardy-what-if-watson-beat-the-search-giant/

  • http://www.paulbruemmer.com/ Paul Bruemmer

    Good point Christopher! AI is another related topic and you are absolutely right, “not all human knowledge can be represented by semantic mark up.”

  • Vipin Kumar

    I have to say it Paul, the post you have written is very informative. Great work by you. Thank you for giving this whole insight on entity search. It is very useful.


  • Peter Hatherley

    I agree. Well structured data is absolutely vital. It helps the searcher and those who want to index better on the search engines, and we need to be doing it sooner rather than later.

    I have done an in-depth study of semantics and have developed a unique tool that outputs semantic content. This will be released commercially within 9-12 months.

  • Jenna Schultz

    Very informative! Thanks for sharing.

  • CUbRIK Project

    Hi, in the CUbRIK FP7 European research project, we are investigating and conducting experiments directly on the subjects you have so amazingly described! Research focuses on joining automatic machine intelligence to collective knowledge, from crowdsourcing, social networking and serious gaming to
    increase semantic understanding of multimedia content, starting from a
    knowledge base built on entities.
    Prototypes of practical search-based applications that inherit semantic enrichment are next to come.
    What we have consolidated so far is documented in scientific publications journal articles and formal project deliverables. Have a look at http://www.cubrikproject.eu/index.php/downloads/publications. More will be published as we are starting our 3rd year of funded research agenda.
    On behalf of the project team I welcome your opinion and advise!

    Tonina Scuderi

  • Guest

    Hello again, you can discard this: I was posting twice the same text!
    Tonina Scuderi

  • http://livingwilladvancedirective.com/ David Lemberg

    Thanks, Paul, for a wonderfully informative article. I’m reminded of early discussions of the semantic web, such as Tim Berners-Lee’s article in Scientific American – http://www.scientificamerican.com/article.cfm?id=the-semantic-web. Users have needed semantic search since 1996, so a 20-year development horizon is not too shabbby. As other posters have noted, semantic search or AIs such as Watson are not really intelligent. Yet. They can answer questions by conducting near-instantaneous search. They can link semantic objects to answer challenging Jeopardy questions. But AIs cannot do thinking as such. AIs are not generating new ideas. They may define a fitness peak, for example, or identify a strange attractor. But, again, such “thinking” is stochastic. Anyway, so ends my rant. I highly recommend “Software” and “Wetware” by the mathematician/novelist Rudy Rucker.

  • http://www.ezsolutionspk.net/ ezsolutionspk

    Nice and fresh thoughts … I really enjoyed reading it a lot.


Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest


Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States


Australia & China

Learn more about: SMX | MarTech

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!



Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide