How To Prune The Enterprise Link Tree

As Barry Schwartz pointed out earlier this month, Google’s warning sites about spammy link practices. And it’s no April Fool’s joke. While most of the attention’s been focused on affiliates, link networks and the like, enterprise sites need to take a careful look at their own link profiles.

Tree pruning

If only it were this easy

But that’s not easy. Instead of hundreds of links, you may be looking at thousands, or tens of thousands. Or more. In my experience, a moderately-popular enterprise client can have 30,000-40,000 links from 2,000-3,000 domains.

You need a process, and a few tools, if you’re going to complete this task and maintain your sanity. You have to automate what you can and reduce the steps necessary for any required hand-filtering.

Do We Have To…?

The first question I usually hear from this type of client is: “Why do we even have to check our link profile? We’re a big company. We’ve accumulated lots of links over the years. We’re fine, right?”

Maybe. Maybe not. I’m not just spreading FUD here. Google has made it crystal-clear that they’re cracking down on all manner of ‘over-optimization’, both on- and offsite. Unless you know every SEO tactic that’s ever been used on your site, you need to audit your link profile.

The Tools

To run an enterprise-scale link profile audit, you’re going to need a few tools:

  1. A link database. SEOMOZ’s Open Site Explorer or MajesticSEO’s database will work. Using both will work even better. ahrefs has a new tool that’s worth a look, too.
  2. Microsoft Excel. Say what you want about Microsquish. Excel is still the most kickass toolset an SEO can have. Google Spreadsheets is awesome, but Excel still has the edge. If you somehow don’t already have it, get it.
  3. WHOIS data. You’ll want access to the WHOIS database, either via scripting (see the next item) or through a paid service. The ability to perform bulk WHOIS lookups will save you a lot of time, so paying a bit extra for a service like could make sense. It’s cheaper than therapy.
  4. A Web crawler of some kind. Screaming Frog or Xenu will do the trick.
  5. A scripting language. Yes, I said it again. you need to know a programming language. If you don’t, OK, but this would really be a good time to learn.

The 19-Step Process

Here’s how I go about it. Of course, this is not the only way. It’s probably not even the best. I tend to find these shortcuts and design this kind of stuff on the fly.

On the other hand, this process lets me sift through 30,000+ links in less than 3 hours. Which means more Skyrim time – a win-win.

  1. Create a ‘whitelist’. That’s a list of domain names that are 100% (cough OK 90%) legitimate link sources.
  2. Grab the basic link data from Open Site Explorer and Majestic. Import both into Excel.
  3. Combine the two URL lists, including SEOMOZ Domain Authority and/or Majestic ACRank so that you have a single list of all linking URLs. Filter out any duplicates.
  4. Pull a list of unique domain names from that list. I use Python to do this. You can use Excel’s Text to Columns feature, too: Split the text up at each “.”, remove any folders and queries, and you should have a list of domain names.
  5. Remove any whitelisted domains.
  6. Run a WHOIS query on each domain name. Be sure to get the hostname, registrant name and status, at a minimum. Store that in Excel, too. I use Python to perform the bulk lookup. You can also send a list of domains to a paid service and they’ll do it for you.
  7. Grab the IP address of each domain. You can use NSLOOKUP to do this, if you want to get all geeky about it. There are a few tools you can add to Excel, or you can script it in Google Spreadsheets. None of this is trivial, I know. It’s the price of success – you wanted your terrifying in-house SEO job for a Fortune 100. Time to pay up!
  8. Use Vlookup to combine the domains, WHOIS results and Majestic/SEOMOZ/ahrefs data. It’s important that you have all of this in one place.
  9. Now, look for sites that share common registrants. Ignore the private domain registration companies. Yes, that’s a lot of them. But you’ll be amazed how many link networks still operate ‘in the clear’.
  10. If you find groups of sites owned by a single person or company, flag them. Why? Because multiple sites under a single owner may be part of a link network.
  11. Compare IP addresses, the same way you did registrants. If you have collections of sites under the same IP address, flag those, too.
  12. Now you should have a list of flagged domains.
  13. Grab those domains and run your Web crawler, fetching the home page of each domain. I use Python for this, saving the HTML for each page for the next few steps.
  14. Check the results for phrases that are a dead giveaway for spam: “High pagerank,” “Link building,” “Upgrade your link” and “Free link” are some of my favorites.
  15. Get a word and link count for each page. Compute the ratio of words to links. I use Python and BeautifulSoup (an HTML parser for Python) to do this.
  16. Pull all this data into your domains list.
  17. Score your domains. I use a holistic 1-10 scale: The more ‘spam factors’ in evidence, the higher the score. So a page that’s part of a 10-domain portfolio, has spammy-sounding phrases on it and has a low ratio of words to link will get a really high score.
  18. Sort your spreadsheet by score. Then do a quick check of the worst offenders. If they’re spam, get those links removed.
  19. Repeat this process as necessary.

Getting Fancy

A few additional, easily-automated steps you can try:

  1. Use natural language processing to compare 5 blog posts on any given blog. If they have little or no relation to each other—one’s about pharmaceuticals, and the next is about vacationing in Miami, for example—that could be a spam blog.
  2. Check the writing grade level. Super-low or super-high may mean badly written, spun content.
  3. Use an automated grammar checker like Queequeg and get an error count. More errors means a higher likelihood of spun content.
  4. Check for blog sites using default templates. That’s a sure sign of a spam blog.
  5. Check for big collections of footer links. Then look for sites that are interlinked into ‘wheels’ or whatever the link sellers are calling them now.

Accumulate Knowledge

As you do this process, save your data. Keep a list of the best and worst domains, site owners and IP blocks. It’ll make future audits far easier.

Prune, But Also Plant

Of course, don’t neglect authority-building. Your content strategy, social media strategy and branding will help you grow your authority profile even as you prune back your low-quality links.

None of this is easy. But almost all of it can be automated. Put in the time now and you can stay ahead of future Google warnings, improve SEO and build a lasting information asset for your company.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: Channel: SEO | Enterprise SEO


About The Author: is Chief Marketing Curmudgeon and President at Portent, Inc, a firm he started in 1995. Portent is a full-service internet marketing company whose services include SEO, SEM and strategic consulting.

Connect with the author via: Email | Twitter | Google+ | LinkedIn


Get all the top search stories emailed daily!  


Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • FScharnell

    This is a great article, super great process thanks for the tips. What I really enjoyed though was:

    “On the other hand, this process lets me sift through 30,000+ links in less than 3 hours. Which means more Skyrim time – a win-win.”

    Gotta have our priorities straight. lol

    Thanks again!
    Frank Scharnell

  • Scott Krager

    Really great post to distill links down to a 1-10 score. I would say though that while much of that process can be automated, the crux of getting a link removed often remains at the hands of the site where the link was posted. So you have to ultimately rely on someone else and their timeline to get the link removed. 

    It’s funny, I think the better the link, the easier it probably is to get removed. But the crappier the link (think comment spam or something like that) the more challenging it can be to get removed.

  • Notify Me Now

    What methods do you use to get your link removed?  What if the spammer simply says no..?

  • Adam Machado

    Very nice techniques, but a waste of time.  Any link that Google thinks is artificial/spammy would just be devalued anyways.  There is no need/benefit to go about trying to remove them.  Getting the unnatural links detected message means nothing.  Google has been finding and devaluing untrusted links for a very long time.  All the notification is meant to do is scare webmasters.  Fear is the only tool Google has against “unnatural link building”

    The argument, that some are making, that it is a good preventative measure to remove spammy looking links is poor advice at best.  All you are doing is hurting yourself. 

    Google doesn’t penalize a website purely for having some “spammy” looking links.  Any ranking losses that are experienced would be from previously valued links being devalued.  A short 30-90 day trust loss can occur if there are a ton of “unnatural links” detected but that almost never happens to sites with other “Natural” looking links, social signals, quality content etc…

    If you are an affiliate site with fluff content, zero natural links, no social signals, etc… Than yes you will probably be “penalized” if a ton of “Unnatural” links are found.  But again, this has been the way of the world for many many years. 

  • Shahzad Hassan Butt

    What a great post. Kind of research approach. A wonderful bunch of tools for better analysis. A complete SOP for Pruning the links. And a great warning, ”
     improve SEO and build a lasting information asset for your company. ” 

    thumbs up and hats off

  • Dannish Abby

    Excellent analysis and words .you have nicely covered  the warning part which is a must now a days to be saved from getting spammed or earn negativity . great work keep it up

  • kedar singh

     best link building services provided

  • kedar singh

    good blog for seo

  • kedar singh

    good blog for seo

  • SEM_Blog

     I totally agree. I recently created a post on how google is causing unneccessary panic among seos. What do you think about it:

  • Kit Pierce

    SEO blogging par excellence! Literally one of the best articles I’ve read in months. Thanks for the advice Ian; I hope I don’t need it any time soon. ^_^
    Oh and…. Skyrim ftw.

  • Ian hanson

     This paranoia is hurting the SEO community. I do not understand why somebody (Matt Cutts for example) comes out with a proper explanation of ‘over-optimization’.

  • Vengat Owen

    Nice post and useful information to make up. 


Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest


Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States


Australia & China

Learn more about: SMX | MarTech

Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!



Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide