• Search Engine Land
  • Sections
    • SEO
    • SEM
    • Local
    • Retail
    • Google
    • Bing
    • Social
    • Resources
    • More
    • Home
  • Follow Us
    • Follow
  • Search Engine Land
  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Resources
  • Live
  • More
  • Events
    • Follow
  • SUBSCRIBE

Search Engine Land

Search Engine Land
  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social
  • Resources
  • More
  • Newsletters
  • Home
SEO

Using word vectors and applying them in SEO

Contributor JR Oakes takes look at technology from the natural language processing and machine-learning community to see if it's useful for SEO.

JR Oakes on September 19, 2016 at 9:54 am
  • More
Word Vectors and SEO

Today, the SEO world is abuzz with the term “relevancy.” Google has gone well past keywords and their frequency to looking at the meaning imparted by the words and how they relate to the query at hand.

In fact, for years, the common term used for working with text and language had been natural language processing (NLP). The new focus, though, is natural language understanding (NLU). In the following paragraphs, we want to introduce you to a machine-learning product that has been very helpful in quantifying and enhancing relevancy of content.

Earlier this year, we started training models based on a code base called Char-rnn from Andrej Karpathy. The really interesting thing about this code base was that you could (after training) end up with a model that would generate content based on what it learned from the training documents. It wouldn’t just repeat the content, but it would generate new readable (although quite nonsensical) content.

It operates by using a neural network to learn which character to guess next. If you have the time, Karpathy’s write-up is a fascinating read that will help you understand a bit more about how this works.

In testing out various code bases, we came across one that, instead of predicting characters, attempted to predict which words would come next. The most interesting part of this was that it used something called GloVe embeddings that were basically words turned into numbers in such a way that the plot of the number coordinates imparted semantic relationships between the words. I know, that was a mouthful.

What is GloVe?

GloVe stands for “global vectors for word representation.” They are built from very large content corpuses and look at co-occurrence statistics of words to define relationships between those words. From their site:

GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space.

Here is an example of the term “SEO” converted into a word vector:

Word vector representation for seo

To work with GloVe embeddings, you need familiarity with Python and Word2Vec, as well as a server of sufficient size to handle in-memory storage of 6+ billion words. You have been warned.

Why are GloVe vectors important?

GloVe vectors are important because they can help us understand and measure relevancy. Using Word2Vec, you can do things like measure the similarity between words or documents, find most similar words to a word or phrase, add and subtract words from each other to find interesting results, and also visualize the relationship between words in a document.

Similarity

If you have an understanding of Python, Gensim is an excellent tool for running similarity analysis on words and documents. We updated a converter on Github to make it easier to convert GloVe vectors to a format that Gensim can use here.

To show the power of GloVe vectors to produce semantically similar words to a seed word or phrase, take a look at the following image. This was the result of finding the most similar words to “dui lawyer” using the Gensim library and GloVe vectors (geographical terms were removed).

Similarity terms for dui lawyer

Note how these are not word variations or synonyms, but rather concepts that you would expect to encounter when dealing with an attorney in this practice area.

Adding and subtracting vectors

One of the most frequently used examples of the power of these vectors is shown below. Since the words are converted into numerical vectors, and there are semantic relationships in the position of the vectors, this means that you can use simple arithmetic on the vectors to find additional meaning. In this example, the words “King,” “Man” and “Woman” are turned into GloVe vectors prior to addition and subtraction, and “Queen” is very close to the resulting vector.

Adding and subtracting vectors

Visualization

Once we are able to turn a document of text into its resulting vectors, we are able to plot those words using a very cool library called t-SNE along with d3.js. We have put together a simple demo that will allow you to enter a keyword phrase and two ranking URLs to view the difference in vector space using GloVe vectors.

Demo is here.

It’s important to point out a few things to look for when using the demo.

Look at the relationships between close words

Notice how groupings of words are not simply close variations or synonyms, but rather unique words that just belong together.

Keyword vector space grouping

Use pages with a good amount of content

The tool works by extracting the content on the page, so if there is not much to work with, the result will not be great. Be careful using home pages, pages that are listings of excerpts or mostly image-based content.

Small words don’t mean small value

The size of the resulting words is based on the frequency with which the word was encountered, not the importance of the word. If you enter a comparison URL that is ranking higher than you for the same term, take note of the color differences to see topics or topic areas that you may be missing on your page.

Wrapping it up

Obviously, from an SEO perspective, it is beneficial to create content that covers a topic as thoroughly as possible and that ensures a good experience for your visitor. While we don’t expect all SEOs to run out and learn Python, we think knowing that there is amazing power to be leveraged to that end is an important point to relay. GloVe vectors are one of the many tools that can be leveraged to give you an edge on the competition.

Finally, for those who are fans of latent dirichlet allocation (LDA), Chris Moody released a project this year called LDA2Vec that uses LDA’s topic modeling, along with word vectors, to create an interesting way to assign and understand the various topics within a corpus of text.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.



About The Author

JR Oakes
JR Oakes is the senior director of technical SEO research at Locomotive. He was formerly director of technical SEO at the Adapt Partners agency. He works with clients on a wide range of fronts including technical issues, performance, CTR, crawl-ability, content, and data analysis. JR loves testing, coding, and prototyping solutions to difficult search marketing problems. When he isn’t working, he enjoys reading about emerging technologies, playing bass guitar, watching college basketball, cooking and spending time with his friends and family. He is also one of the co-organizers of the Raleigh SEO Meetup, Raleigh SEO Conference, and RTP SEO Meetup.

Related Topics

All Things SEO ColumnChannel: SEOSEO - Search Engine OptimizationSEO: General

We're listening.

Have something to say about this article? Share it with us on Facebook, Twitter or our LinkedIn Group.

Get the daily newsletter search marketers rely on.
See terms.

ATTEND OUR EVENTS

Lorem ipsum doler this is promo text about SMX events.

February 23, 2021: SMX Report

April 13, 2021: SMX Create

May 18-19, 2021: SMX London

June 8-9, 2021: SMX Paris

June 15-16, 2021: SMX Advanced

August 17, 2021: SMX Convert

November 9-10, 2021: SMX Next

October 2021: SMX Advanced Europe

December 17, 2021: SMX Code

Available On-Demand: SMX

×


Learn More About Our SMX Events

Discover actionable tactics that can help you overcome crucial marketing challenges. Our next conference will be held:

MarTech 2021: March 16-17

MarTech 2021: Sept. 14-15

MarTech 2020: Watch On-Demand

×

Attend MarTech - Click Here


Learn More About Our MarTech Events

White Papers

  • 6 Powerful Ways Experience Analytics Can Help Your Business Now
  • Email Tune-Up: A 5-Point Inspection to Get Your Program in Gear
  • Digital Marketing Report Q4 2020: Benchmarks and Insights for 2021
  • Data SEO – The Next Big Adventure
  • Getting Started with Email Marketing Automation
See More Whitepapers

Webinars

  • How to Avoid the Digital Transformation Trap
  • How to Build a Marketing System of Record
  • Meet BIMI: The brand-boosting email security marketers must have for 2021
See More Webinars

Research Reports

  • Local Marketing Solutions for Multi-Location Businesses
  • Enterprise Digital Asset Management Platforms
  • Identity Resolution Platforms
  • Customer Data Platforms
  • B2B Marketing Automation Platforms
  • Call Analytics Platforms
See More Research

h
Receive daily search news and analysis.
Search Engine Land
Download the Search Engine Land App on iTunes Download the Search Engine Land App on Google Play

Channels

  • SEO
  • SEM
  • Local
  • Retail
  • Google
  • Bing
  • Social

Our Events

  • SMX
  • MarTech

Resources

  • White Papers
  • Research
  • Webinars
  • Search Marketing Expo
  • MarTech Conference

About

  • About Us
  • Contact
  • Privacy
  • Marketing Opportunities
  • Staff
  • Connect With Us

Follow Us

  • Facebook
  • Twitter
  • LinkedIn
  • Newsletters
  • Instagram
  • RSS
  • Youtube
  • iOS App
  • Google Play

© 2021 Third Door Media, Inc. All rights reserved.

Your privacy means the world to us. We share your personal information only when you give us explicit permission to do so, and confirm we have your permission each time. Learn more by viewing our privacy policy.Ok