A Foolproof Approach To Writing Complex Excel Formulas
There comes a time in a marketer’s life when making pretty charts with a predefined dataset just doesn’t cut it. And finding the sum and average of a column of data just doesn’t satisfy you anymore. Eventually — and it’s really inevitable — you will actually have to dive deep into the data and cull […]
There comes a time in a marketer’s life when making pretty charts with a predefined dataset just doesn’t cut it. And finding the sum and average of a column of data just doesn’t satisfy you anymore.
Eventually — and it’s really inevitable — you will actually have to dive deep into the data and cull out a smaller dataset or manipulate it in some way to make it cough up what you need. And, as freaking intimidating as they can be, formulas become your lifeline in these moments.
Admittedly, the learning curve with formulas can be pretty steep, but the only way they’ll become intuitive is exposure to them and practice.
Excel Formulas Still Scare Me Yet I Did This (You Can Too!)
However, when I was first learning them, I’d see a formula like the one below, which I wrote to find the last directory in a URL (just to see if I could, not for any particular use), and think that I’d never ever be able to do anything like that. Turns out I was dead wrong.
Before you throw in the towel, let me tell you a trick I first saw Bill Jelen (AKA Mr. Excel) do that makes writing formulas — even advanced formulas like this one — much simpler.
If you put this hideous monstrosity in front of me with no context and asked me to explain step by step what each part of that formula means, I probably wouldn’t be able to do it. Not because I copied it off the Internet. I can honestly say I didn’t get any help with it at all. And I purposely chose something kind of esoteric so I wouldn’t be tempted to look it up and to test myself to see if this approach would work for a really advanced formula. And it did. Like a charm.
So, enough pontificating. Let me show it to you in action. But first let’s cover some terminology. And don’t worry; we’ll start with a much simpler task than that data voodoo.
Formula: Formulas provide some kind of instructions for Excel to calculate something. They always start with an equal sign. It can be as simple as =2+2 or ridiculously complex like the one above.
Function: In the simplest terms, functions are formulas that come prepackaged in Excel. I mean, you could create your own functions, but most of the time you’ll be using one of the 80 bajillion offered natively in Excel (unless, of course, you want percent difference, which Excel doesn’t offer … go figure). With the more advanced formulas you’ll write, you will use multiple functions in one formula.
Argument: Each function starts with the function name, followed by a set of parentheses, e.g., =SUM(A3:A67) or =CONCATENATE(B2,C2,D2). The individual elements inside those surrounding parentheses, separated by commas, are arguments.
Square brackets: Sometimes you’ll see arguments listed in square brackets in the tool tip or on the Microsoft site. The brackets indicate that the argument is optional.
Boolean: Returns a TRUE or FALSE value.
Helper cells: Okay, this isn’t an actual term; it’s what I call the cells you use to break down a complicated formula into easy-to-understand steps.
If you’d like to download the Excel workbook I worked from in the following examples, you can access it here. I even included the crazy formula you see above. You’ll see on one tab I break the process down into simple steps (which is exactly what I did first thing) and then step by step with all the helper cells.
Then in the last tab I replace all of the helper cells with the formulas they contain and just kept sweeping right until I got to the final formula. But,for whatever reason, I kept coming up with one extra character that I didn’t need. I checked my formulas several times and tried retracing my steps.
Finally, I just subtracted 1 on the end, and it all worked as planned. At the end of the day that’s what matters, so I went with it.
Task 1: Extract Domain From URLs
Let’s say you have a list of URLs (perhaps backlinks), and you want to extract just the domain. I did this recently so that I could use that column in a pivot table to group all target URLs that were linked to from a particular domain.
For example, let’s say www.linkingsite.com linked to www.mysite.com/landing-page-02 and www.mysite.com/landing-page-45. The two landing pages would show up under the www.linkingsite.com domain in my pivot table, the way I had it organized.
So, how’d I extract those? Easy. But first some background.
Important Stuff You Need To Understand
When you need to extract a string of text from a longer string in Excel, the three primary functions you want to use are LEFT, RIGHT, and MID. You can get tricky with REPLACE, which works a lot like Find and Replace in Excel or Word, but we won’t get into that here.
So basically what each of these functions does is asks you:
- What cell you want to extract the text from
- How many characters you want to extract
With the MID function you also specify the starting place because, well, you’re pulling from the middle of the string.
That’s all well and good, but with our column of URLs the number of characters we need to extract changes from URL to URL, so we can’t use a static number. That’s where the SEARCH function comes in.
Quick aside: If you’ve dabbled in formulas much, you might be asking why I don’t use the FIND function. Well, the SEARCH and FIND functions are very similar, only the FIND function is more limiting in that it’s case sensitive and it doesn’t support wildcard characters. Lame. So the only time I use FIND is when I want to specify case, which the last time I did that was last … Lessee here … Yeah, never.
So what we’re going to do is use the SEARCH function to specify the number of characters we need because it returns the position of whatever you ask Excel to find. If it doesn’t find what you’ve searched for it returns a nasty #N/A error, which we’ll actually USE in the next example. However, for the task at hand, what we need is the position of the first forward slash after the domain.
Buuut, there’s a problem.
See those two forward slashes before the domain (http://)? Yeah, those are going to trip us up. But no worries. The SEARCH function has an optional argument for the starting number (start_num), which we’ll use to our benefit.
Okay, so let’s get poppin here …
So, the secret sauce is we’re going to split the formula into two steps. Then when we’re finished, we’re going to combine them into one formula.
Excel helps you each step of the way by highlighting the argument you’re working on at the time. Here are what we’re going to use for each:
This says we’re looking for a forward slash.
This is the first cell we’re going to extract the domain from. When we finish, we’ll hover over the bottom-right corner of the cell to drag the formula down the column.
We need to just choose a number that’s after the last / before the domain. We could have used 8 here, but if any of the URLs are secure, you would need one more character. Check it out for yourself.
https://* <– The * is in the 9th position, which is where we want to start looking for the / that comes after the domain.
So, the final formula looks like this:
This tells us what position each of the forward slashes directly following the domain is in. And we’ll use that to provide the number of characters to extract in the LEFT function.
Here are the arguments we’ll use for the LEFT function:
So, the final formula looks like this:
Now, all we need to do is combine the two so that instead of referencing cell C3, we’re embedding the SEARCH function into the final formula. To do that, just hit the Esc key to exit out of the cell you’re in, then copy the SEARCH function to your clipboard (not the = sign, though).
Next, hit the Esc key again to back out of that cell, and go back to the cell that houses your final formula and replace the C3 reference with the formula you just copied from C3. Your formula should now look like this:
Task 2: Compare Two Datasets
In my audit Google Doc, one of the tasks I include is a check for orphan pages on the site (on the Links tab). One way to check for those is to compare your site’s sitemap against a site crawl, which can be performed with a tool like Screaming Frog. (Disclosure: I’m not affiliated with Screaming Frog in any way.)
If you find that there are URLs in your sitemap that weren’t found in the Screaming Frog crawl (or whatever tool you use), there’s a good chance that these are orphaned or blocked pages. Either way, they warrant investigation and probably shouldn’t be in your sitemap.
For the formula, we’ll use the following functions:
IF: IF(logical_test, [value_if_true], [value_if_false])
MATCH: MATCH(lookup_value, lookup_array, [match_type])
For the sake of time, I won’t go into all the details about what these mean. Each of the links takes you to the page on the Microsoft site that explains all about that particular function.
Since this formula requires a couple more steps, I’ll break them out into steps.
Step 1: Look to see if the URL in C2 is anywhere in Column B.
The MATCH function is just another lookup function Excel offers, and it’s especially versatile. But, what it returns is the row number of whatever you’re looking up. If it can’t find it, it returns an #N/A error (which we’ll use).
Step 2: ISNA simply returns a Boolean value that just lets you know if the value in the cell you’re referencing is an #N/A error.
Step 3: Write an IF statement that assigns the value “Orphan” if the URL isn’t in the Screaming Frog column (in other words, is returning an #N/A error) and “Found” if it is found.
Step 4: Replace every helper cell reference with the formula inside that cell (everything but the =). If you spread your helper cells from left to right across your spreadsheet, like I do, start to the far-left and work your way to the right.
If you spread them out vertically, start at the top and work your way down to the last formula you wrote. That way, you won’t accidentally miss any helper cells.
The definitive guide to Excel for marketers is Distilled’s Microsoft Excel for SEOs guide. It’s a great jumping-off point if you want to start your journey by focusing on functions marketers use most.
If you really want to go hardcore, I highly recommend Bill Jelen’s YouTube channel. His playlists are especially helpful if you’re looking to concentrate on a particular, weak area. Another podcast I really like is Mike Girvin’s ExcelIsFun YouTube channel.
I also regularly publish Excel video tutorials on my blog.
If you didn’t understand all of the nuances of the two formulas we worked with, don’t worry about it. Just grasp the principle of using helper cells to test your functions along the way and to break your process down into easily, digestible steps. I used to try to jam everything into one cell and would find myself all tangled up in the formula, even with the tool tips.
Now, I’m much more comfortable with navigating my way through somewhat complex formulas, but when they get too tortuous, I start parsing pieces into their own cells.
Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.
New on Search Engine Land