Future SEO: Linked Open Data (LOD)
As mentioned in my column on string entity optimization, the use of structured data allows search engines like Google to understand your page content so it can display better search results, or answers, to user queries. This month, I’ll focus on Linked Open Data (LOD), which will allow you to publish structured data so it can be interlinked to establish relationships. […]
As mentioned in my column on string entity optimization, the use of structured data allows search engines like Google to understand your page content so it can display better search results, or answers, to user queries.
This month, I’ll focus on Linked Open Data (LOD), which will allow you to publish structured data so it can be interlinked to establish relationships. This is important, as the relationship between words allows a clear understanding of site content by search bots.
In my column about understanding entity search, I explained how semantic search uses an ontology (or language) like microdata, RDFa, etc., to break down a sentence into its subject, predicate and object to show the relationship between the words in your content.
Linked Open Data builds on standard Web technology such as HTTP, RDF, URLs, etc., extending them so they can be can be read automatically by computers. That’s why it’s important for SEOs to understand and use LOD when applying structured data to content — to make it easier for machines to read that content.
Sentences More Important Than Keywords
LOD is used to leverage “sentences” in the digital realm as we do in everyday life. Optimizing for semantic search using LOD is about using a digital rendition of natural language sentence structure as the basis for describing things (content). SEOs need to look toward the use of sentences rather than keywords in order to enhance content published on the Web or on Intranets.
It appears “future SEO” will require a more technical background. Most SEOs, myself included, will have to collaborate with the semantic Web community to iron out the details. This isn’t some new optimization tactic for SEOs to cut-and-paste into their client pages; but it’s the very fabric of the Web and will require your time, energy, study and perseverance to work through it.
To explain what Linked Data consists of in simple terms, Tim Berners-Lee has defined the following LOD principles.
In his Design Issues: Linked Data, Berners-Lee provides four principles of linked data (paraphrased below):
- Use URIs (Uniform Resource Identifiers) to indicate things
- Use HTTP URIs so things can be referred to and found by people or software on behalf of people
- When looking up a URI (thing), provide useful information leveraging standards such as RDF (Resource Description Framework) or SPARQL (an RDF query language)
- Include links to other related things (URIs) when publishing data on the Web so they can discover other things
To help explain more about what LOD is and how you can use it, I’d like to share a recent interview with Kingsley Idehen, founder & CEO of OpenLink Software. Kingsley is an industry-acclaimed technology innovator and provider of technology that exploits LOD across the enterprise and World Wide Web.
What Is Linked Open Data (LOD)?
Paul: Kingsley, can you give us an idea of what LOD is?
Kingsley: Linked Open Data is structured data representation enhanced through the use of HTTP URIs (links). Basically, it’s about entity relationship — model-based structured data representation where entities, attributes, and attribute values are denoted (“referred to”) by links.
HTTP URIs are implicitly open in that translating what they denote is a function of the HTTP protocol as opposed to proprietary protocols scoped to a specific application or platform.
Can you give us an example?
The following statement:
Paris is the capital of France.
Expresses a relationship represented using natural language notation whereby all participants are denoted literally using words:
“Paris” “capital” “France”
And each plays a specific role, i.e., “Paris” is the Subject, “capital,” the Predicate and “France,” the Object.
Courtesy of Linked Data, the statement above could be enhanced by the use of reference (as opposed to literal) identifiers to denote the entities in the roles of: subject, predicate and object.
<#Paris> <#capital> <#France>
If I copy the statements above to a document and then make the document available to users on an HTTP network, I would end up with a document that would automatically demonstrate Linked Data due to the fact that I would have a collection of links presented in my browser that enable me explore the entity relationship represented by the link-enhanced statement. Semantically, my single statement document implies:
<> <#type> <#Document> .
<> <#mentions> <#Paris> .
<> <#mentions> <#Capital> .
<> <#mentions> <#France> .
<#Paris> <#capital> <#France> .
Note: “<>” is simply shorthand that implies the HTTP URL of the document is to be used as the HTTP URI that denotes the subject in the statement above. Basically, you have a description of a document that includes descriptions of other things. No different to this interview, so to speak.
The use of the phrases HTTP URI and HTTP URL can be confusing, so it’s best to look at how they are applied to entity denotation as follows:
- HTTP URIs denote (“refer to” or name) anything
- HTTP URLs (a kind of HTTP URI) denotes Web Documents
- WebIDs (a kind of HTTP URI) denotes Agents (People, Organizations, Software, Machines, and anything else capable of mechanized operation)
What Is The Linked Open Data (LOD) Cloud?
I’ve heard the LOD Cloud is a massive big-data collective comprised of datasets from a variety of domains such as: general knowledge (Wikipedia), Life Sciences (Bio2RDF), Media (BBC), Government (Data.Gov and Data.Gov.UK) and many others. Can you explain the LOD Cloud in a little more detail for us?
This massive collection of data is an enclave on the Web where all the structured data, in the published datasets, is represented and then published inline with Linked Data principles, i.e., HTTP URIs are used to denote things, because doing so makes the structured data webby (or web-like). In a nutshell, data becomes as navigable and discoverable as anything else on an HTTP network (e.g., the World Wide Web).
Using my earlier example, I can leverage the massive LOD cloud as a powerful source of identifiers that denote a broad range of things. For instance, I can cross-reference entity identifiers in my basic examples with identifiers from the LOD Cloud, as follows:
<> <#type> <#Document> .
<> <#mentions> <#Paris> .
<> <#mentions> <#Capital> .
<> <#mentions> <#France> .
<#Paris> <#capital> <#France> .
<#Paris> <#sameAs> <http://dbpedia.org/resource/Paris> .
<#France> <#sameAs> <http://dbpedia.org/resource/France> .
Example, placing the statements above in a document published to an HTTP network expands on the basic demonstration of what Linked Open Data accords. As you can see, my link traversal is no longer confined to my document; I’ve made a reference to data within DBpedia, which as a major junction-box in the LOD Cloud could send me (or agents) anywhere.
What’s The Difference Between Linked Data And Linked Open Data?
Are Linked Data and Linked Open Data the same thing?
Not really. The linkage comes from the structure of an entity model based statement (a kind of sentence). The openness comes from the use of a standard for entity denotation in the form of HTTP URIs. Do note, it is possible to make entity relationship model statement collections that provide a structured representation of data using many kinds of identifiers; the magic of HTTP URIs as denotation mechanisms lies in the underlying openess of URIs and the HTTP protocol.
You can have Linked Data that isn’t “Open,” through the use of proprietary identifiers for entity denotation. In short, this is how we’ve all worked with computer programs for years, prior to the emergence of URIs and the HTTP protocol. Even RDF (which mandates the use of URIs and is often conflated with Linked Data), can be used to produce Linked Data that isn’t actually “Linked Open Data.”
The diagram that follows goes a long way toward dispelling some of the confusion that swirls around Linked Data and RDF; by reminding everyone of the *fact* that Linked Data was at the very core of the Web’s original design. Recently, I tweaked Tim Berners-Lee’s original proposal document; by using HTTP URIs as opposed to Strings to denote the nodes (subjects or objects) and connectors (predicates) in the network diagram (or graph) that depicts his original World Wide Web proposal.
How Does LOD Benefit A Publisher (E-Commerce Provider)?
With Search Engines in mind and using e-commerce as an example, can you explain the benefits to us?
It increases the Serendipitous Discovery Quotient (SDQ) of content. By that I mean: it increases the degree to which content is discovered in a manner that’s “pleasantly surprising” to users with regards to relevance.
What is Serendipitous Discovery Quotient (SDQ)?
SDQ is a metric for understanding the effects of enhancing structured data representation via HTTP URIs. Golliher wrote a good article a while back (in reaction to his first encounter with acronym) titled, Serendipitous Discovery Quotient (SDQ): The Future of SEO? Or an Abstract Concept?
IQ is a metric associated with human intelligence. SDQ is a metric associated with Web intelligence.
How can e-commerce benefit?
E-commerce vendors can actually focus on what actually comes naturally to them, i.e., producing fine-grained descriptions of their products and services, knowing that description clarity is ultimately always the critical factor for discoverability that leads to customer growth and retention.
This fundamentally implies that the description of entities such as offers, products, pricing, availability, opening and closing hours etc., become the focal point of Web content strategy, much more so than site aesthetics and old-school keyword-based SEO hacks.
What About Schema.org Types?
Do Schema.org semantic markup, entities and LOD relate to each other in some way?
Very much so! In schema.org, you have a powerful bridge for publishing structured data that simplifies integration with the LOD cloud. From the LOD cloud side of things, you already have schema.org cross-references in datasets such as DBpedia and many others. It’s all happened in a very natural way, rather than through any kind of brute force.
Today, many online retailers are already publishing structured data based on terms from Schema.org, and in doing so they are enhancing discoverability across three critical frontiers:
- Search Engines
- Social Media
- LOD Cloud
How Are Hashtags & Linked Data Related To SEO?
Barbara Starr talks about the relationship between Hashtags, Linked Data and SEO; can you elaborate a little more on this?
Hashtags solve a problem that’s long challenged HTTP URIs, i.e., the unwieldy aesthetic nature of long URIs. Through the use of hashtags, the Web user community has used folksonomy-oriented patterns to device a shorthand pattern for HTTP URIs.
Thus, courtesy of *hashtag* adoption across social media service providers, you can perform the act of HTTP URI based denotation through the practice of hash tagging. Just like that! Everyone is annotating the Web in a manner that adds more semantics to the connections between the entities denoted by these tags.
Action Items & Take Aways
What action items do you recommend for SEOs to be a part of LOD?
Get to understand that LOD isn’t some scary mysterious thing that’s a specialization for a chosen few. Instead, look at how tagging (using hashtags) is altering the way people post or track content across social media. Just click on a hashtag on G+ or Twitter for instance, and you will immediately realize that each hashtag is really a super key that resolves to a “Topic Web” comprised of contextualized links to related posts, images, music and videos, etc.
All you have to do is describe things using simple subject->predicate->object statements or simply tag posts using hashtags.
Thanks for sharing your insight with us, Kingsley. As always, change is on the horizon, and understanding semantic markup and linked open data is becoming more and more a best practice for SEOs.
Linked Open Data Advantages
Below are some key reasons why SEO practitioners must take note of the information above on Linked Open Data:
- Openness: This means moving away from optimization for each search engine and their periodic changes of ranking algorithms; this is optimization for the Web as whole.
- Cost-effectiveness: The longevity of SEO is based on entity descriptions, oriented documents that are inherently search engine agnostic.
- Discoverability: This increases serendipitous discovery by focusing on entity description granularity.
For more elaboration on the LOD Cloud, I may collaborate with Kingsley in a future column. In the meantime, see the LOD resources below for more information.
Semantic SEO Resources
- Semantic Web Technology and Linked Data: SemanticWeb.com
- Linked Data: LOD Cloud
- Kingsley Idehen: Personal Blog and G+ Posts Collection
- Aaron Bradley’s Blog: Basic Vocabulary for schema.org and Structured Data
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.
New on Search Engine Land