Search Engine Land » SEO » Content » Machine learning for large-scale SEM accounts

Machine learning for large-scale SEM accounts

Can machine learning be applied to your PPC accounts to make them more efficient? Columnist David Fothergill describes how he utilized machine learning to find new keywords for his campaigns.

David Fothergill on July 1, 2016 at 11:00 am | Reading time: 5 minutes

Chat with SearchBot

A key challenge when working on what we could term “large-scale” PPC accounts is efficiency. There will always be more that you could do if given an unlimited amount of time to build out and optimize an AdWords campaign; therefore, the trick is managing priorities and being efficient with your time.

In this post, I will talk about how concepts from machine learning could potentially be applied to help with the efficiency part. I’ll use keyword categorization as an example.

To paraphrase Steve Jobs, a computer is like “a bicycle for the mind.” The generally understood meaning of this statement is that, in the same way a bike can increase the efficiency of human-powered locomotion, computers can increase human mental productivity and output.

With the existentialism out of the way, let’s get to something tangible — We’ll explore here how relevant/valuable it could be to try and automate the process of placing new key phrases into an existing campaign.

What do we mean by machine learning?

As a definition of sorts that ties into our objectives, let’s consider the following to be true:

Machine learning is a method used to devise models/algorithms that allow prediction. These models allow users to produce reliable, repeatable decisions and results by learning from historical relationships and trends in data.

The benefit of “reliable, repeatable decisions” is the exact value that we’re interested in achieving in this case.

At a very high level, the objective of a machine learning algorithm is to output a prediction formula and related coefficients which it has found to minimize the the “error” – i.e., has been rigorously found to have the greatest predictive power.

The two main types of problems solved by machine learning applications are classification and regression. Classification relates to predicting which label should be applied to data, whereas regression predicts a continuous variable (the simplest example being taking a line-of-best-fit).

Categorization of keywords as a “classification” problem

With this in mind, my goal is to show how text classification could be used to programmatically decide where newly surfaced key phrases (e.g., from regular search query reports) can be placed. This is a trivial exercise, but an important one which can be time-consuming to keep on top of when you have an account of any scale.

A primary prerequisite for solving a classification problem is some already classified data. Given that an existing paid search account has keywords “classified” by the campaign they are in, this is a good place to begin.

The next requirement is some “features” that can be used to try and predict what the classification of new data should be. “Features” are essentially the elements on which a model is built — the predictor variables.

The simplest way to transform text data into a feature which is useful to an algorithm is to create a “bag of words” vector. This is simply a vector which contains a count of the number of times a word exists in a given document. In our case, we’re treating a keyword as a very, very short document.

Note: In practice, because our “documents” –i.e., keywords — are short, we could end up with a set of vectors that is not meaningful enough due to lack of diversity, but it’s out of the scope of this article to delve further into this.

Selecting a suitable algorithm

There is a wide range of different algorithmic approaches to solving a wide range of problem types. The below image illustrates this and also shows that there is a certain underlying logic which can help direct you toward a suitable choice.

From Scikit-learn.org

As we’re in the area of text classification, let’s implement a Naive Bayes model to see if there is potential in this approach. It’s a pretty simple model (hence the “naive” part), but the fact that our feature set is pretty simple means it could actually pair up quite well with that.

I won’t go into any detail of how to apply this model, other than sharing how I would implement this in Python using the scikit-learn package — the reason being that I want to illustrate that it’s possible to leverage the power of machine learning’s predictive capabilities on only a few lines of code.*

Below is a fairly exciting screen shot of my Jupyter Notebook going through the key steps of:

loading the data with which to build the model (~20,000 keyphrases, pre-classified);
splitting my data into training and testing subsets (This is necessary so we can “test” that our model will actually predict on future data and not just describe historical data);
creating a basic pipeline which a) creates the features as discussed (CountVectorizer) and b) applies the selected method (MultinomialNB); and
predicting values for the “test” set and gauging how accurate the labeling is vs. the “true” values.

*The caveat here is that I’m not a developer, but a mathematician who programs as a means to a mathematical end.

Conclusion

So, how effective is this? Using a simple measure of accuracy, this method correctly labeled/categorized 91 percent of “new” key phrases (4,431 out of 4,869).

Whilst this could be considered a decent result, we’d do a lot more tuning and testing before putting a model like this into practice.

However, I do believe it provides sufficient evidence that this is a relevant approach that can be taken forward to improve and automate processes — thus achieving the objective of gaining efficiency at scale via reliable, repeatable decision.

Contributing authors are invited to create content for Search Engine Land and are chosen for their expertise and contribution to the search community. Our contributors work under the oversight of the editorial staff and contributions are checked for quality and relevance to our readers. The opinions they express are their own.

Add Search Engine Land to your Google News feed.

Name	Hostname	Vendor	Expiry
_sm_bot	.semrush.com		60 days
It is a cookie-requirement to prevent automated requests and maintain user interaction.
_sm_bot_verify	.semrush.com		60 days
This cookie is necessary to confirm the prior installation of _sm_bot cookie.
__cf_bm	.vimeo.com	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.
_cfuvid	.vimeo.com		Session
Used by Cloudflare WAF to distinguish individual users who share the same IP address and apply rate limits
cookiehub	.searchengineland.com	CookieHub	365 days
Used by CookieHub to store information about whether visitors have given or declined the use of cookie categories used on the site.
__cf_bm	.downloads.digitalmarketingdepot.com	Cloudflare, Inc.	1 hour
The __cf_bm cookie supports Cloudflare Bot Management by managing incoming traffic that matches criteria associated with bots. The cookie does not collect any personal data, and any information collected is subject to one-way encryption.

Name	Hostname	Vendor	Expiry
CLID	www.clarity.ms	Microsoft	365 days
Identifies the first-time Clarity saw this user on any site using Clarity.
_ga_	.searchengineland.com	Google	400 days
Contains a unique identifier used by Google Analytics 4 to determine that two distinct hits belong to the same user across browsing sessions.
_ga	.searchengineland.com	Google	400 days
Contains a unique identifier used by Google Analytics to determine that two distinct hits belong to the same user across browsing sessions.
_gid	.searchengineland.com	Google	1 day
Contains a unique identifier used by Google Analytics to determine that two distinct hits belong to the same user across browsing sessions.
_gat_	.searchengineland.com	Google	1 hour
Used by Google Analytics to throttle request rate (limit the collection of data on high traffic sites)
_clck	.searchengineland.com	Microsoft	365 days
Persists the Clarity User ID and preferences, unique to that site, on the browser. This ensures that behavior in subsequent visits to the same site will be attributed to the same user ID.
_clsk	.searchengineland.com	Microsoft	1 day
Connects multiple page views by a user into a single Clarity session recording.
MUID	.bing.com	Microsoft	390 days
Microsoft User Identifier tracking cookie used by Bing Ads. It can be set by embedded microsoft scripts. Widely believed to sync across many different Microsoft domains, allowing user tracking.
MR	.c.bing.com	Microsoft	7 days
Used by Microsoft Clarity to indicate whether to refresh MUID.
SM	.c.clarity.ms	Microsoft	Session
This cookie is installed by Clarity. The cookie is used to store non-personally identifiable information. The cookie is used in synchronizing the MUID (Microsoft unique user ID) across Microsoft domains.
MUID	.clarity.ms	Microsoft	390 days
Microsoft User Identifier tracking cookie used by Bing Ads. It can be set by embedded microsoft scripts. Widely believed to sync across many different Microsoft domains, allowing user tracking.
MR	.c.clarity.ms	Microsoft	7 days
Used by Microsoft Clarity to indicate whether to refresh MUID.
_cltk		Microsoft	Session
This cookie is installed by Microsoft Clarity tool and stores information about how visitors use the website
_clsk	searchengineland.com	Microsoft	1 day
Connects multiple page views by a user into a single Clarity session recording.
__tt_embed__mounting			Session
We use TikTok to market ourselves using the TikTok cookie that collects data about behaviour and purchases on our website and to measure the effect of our advertising. This tracking is used to evaluate and measure how different campaigns and marketing strategies perform on TikTok.
__tt_embed__storage_test			Session
We use TikTok to market ourselves using the TikTok cookie that collects data about behaviour and purchases on our website and to measure the effect of our advertising. This tracking is used to evaluate and measure how different campaigns and marketing strategies perform on TikTok.

Name	Hostname	Vendor	Expiry
_mkto_trk	.searchengineland.com		400 days
This cookie is associated with an email marketing service provided by Marketo. This tracking cookie allows a website to link visitor behavior to the recipient of an email marketing campaign, to measure campaign effectiveness.
SRM_B	.c.bing.com	Microsoft	390 days
This cookie is installed by Microsoft Bing. Identifies unique web browsers visiting Microsoft sites.
ANONCHK	.c.clarity.ms	Microsoft	1 hour
Used to store session ID for a users session to ensure that clicks from adverts on the Bing search engine are verified for reporting purposes and for personalisation
YSC	.youtube.com	Google	Session
This cookie is set by YouTube video service on pages with YouTube embedded videos to track views.
VISITOR_INFO1_LIVE	.youtube.com	Google	180 days
Set by YouTube and used for various purposes, including analytical and advertising.
VISITOR_PRIVACY_METADATA	.youtube.com	Google	180 days
ttwid	.tiktok.com		360 days
msToken	.tiktok.com		10 days