Removing URLs From The Index In Bulk

Need to remove URLs from the index? For a small number of URLs, this can be a relatively straightforward process — when you are dealing with thousands or tens of thousands, it can be a lot more complex, especially if you feel a great deal of urgency about the matter.

This is what I am going to focus on today: situations where the amount of content is massive and the urgency is high. It turns out that you have several options — so let’s explore what they are!

Why Remove Content In The First Place?

There are many possible reasons, but here are some of the most common:

  1. You published content by mistake. This is certainly one of those that might cause you to feel a great deal of urgency. Accidentally published your company’s strategic business plan? Probably want to get that out of the index, fast.
  2. You discovered massive duplicate content. We all know that duplicate content is bad. Large quantities of it can definitely impact your traffic and rankings, so getting rid of it is important.
  3. Your site was hit by Panda for poor quality pages. Let’s say you think that set of 12,386 pages on your site is way too low in quality, and you are guessing that it led to your site being Pandalyzed. You want to dump them as quickly as you can!
  4. You received a notice in Google Webmaster Tools about pages violating Google’s guidelines. Worse still, it was accompanied by a drop in traffic. Ouch. You want to fix that one fast, too.

One quick followup on my Panda example: we have helped sites recover from Panda solely through noindexing perceived poor quality pages. The first site we helped had in fact been hit by Panda 1.0. Mind you, this only works if you can afford to cut off that limb to save the patient.

So, this is not always practical, but it can work if you recognize a large block of pages as the likely culprit, you know they are going to be hard to fix, and you can afford to live without them.

When you have large quantities of pages to remove, and there is reason to believe that Google and Bing will crawl these at a slow rate, you may need to move to more aggressive solutions. Google and Bing may crawl pages infrequently if your site has low PageRank, or if it simply believes they are poor quality pages (such as pages that are duplicate or that caused a penalty or algorithmic deprecation of your rankings).

The Basics

There are three basic tools in the toolkit. By themselves, these tools do not provide for rapid removal of content, but they represent an important part of the overall puzzle. The three basic tools are:

1. NoIndex Metatag: This is pretty straightforward. Place code like this on the page you want removed:

noindex meta tag

The next time a search engine crawls the page, it will come across the noindex tag, which directs it to not include the page in its index (or to remove the page from the index if it’s already there). The noindex tag is considered a directive, which means it will prevent the page from being indexed even if you screw up and put it on the wrong page – so, be careful! The page itself will still get crawled, though the search engines may visit it less often over time.

The other nice thing about this tag is that pages with the noindex tag can still pass PageRank to other pages both on and off your site. For this reason, do not specify “nofollow” with this tag. This chart summarizes the impact of the noindex tag:

noindex

2. Rel=Canonical Tag: This tag is implemented with code like this:

rel canonical tag code

The tag suggests to the search engine that the page with the tag on it is either a duplicate or a subset of the page that the tag points to (the canonical page). The search engines may choose to obey this suggestion. Unlike with the noindex tag, they may choose to ignore it if they believe you have made a mistake in implementing the rel=canonical tag.

You should still use this tag with care. There are known cases of sites that implemented a rel=canonical tag on every page of their site pointing back to their home page. If the search engines obeyed that tag, it would essentially be a one-page site.

The rel=canonical tag has one big advantage over the noindex tag. While the noindex tag can pass some PageRank through links, the rel=canonical tag passes all its PageRank back to the target page listed in the tag. This is probably closer to what you want. Here is a chart to help you visualize the impact of this tag:

rel-canonical

3. Robots.txt: To be clear, adding a page or a folder to the “Disallow” list in your Robots.txt file does not remove it from the index — it just tells the search engines not to crawl that page or folder. Thus, this is more of a preventative measure than anything else. However, while you can’t use your Robots.txt file to remove already-indexed pages, there are scenarios where adding pages to the “Disallow” list can be effective when used in conjunction with the URL Removal Tools, and I will discuss this below.

The main benefit of asking robots not to crawl certain areas of your site is that it saves on crawl budget. If you have a large and complex site, and search engines are not able to crawl all the pages, this could be important. We can summarize the impact of Robots.txt as follows:

robots txt disallow

Important Note: Don’t bother putting the noindex tag on pages that are listed in Robots.txt! Since the search engines won’t load the page, they will never see the tag!

Use With Great Care: The URL Removal Tool

The URL removal tools provided by the search engines themselves are the way to get things done fast. However, use this with extreme care. Doing this incorrectly can take your site (or part of your site) out of the index for extended periods of time. If you have any discomfort at all about how to use the tool, then do not use it. Simply rely on the other tools outlined above and wait out Google and Bing’s crawl times.

If you have survived the test of the disclaimers, here is what the URL removal tool does for you: It is fast. You don’t need to wait for Google or Bing to crawl the pages. You can get the pages out quickly. The tool will allow you to remove:

  1. A page
  2. A folder
  3. Your entire site

You can find the tool within Google Webmaster Tools here:

google webmaster url removal tool

Use of the Google and Bing URL removal tools are straightforward, and documented in detail on these pages:

  1. Google’s URL Removal Tool Help page
  2. Google: When not to use this tool
  3. Bing’s URL Removal Tool Help page

When you are dealing with a large block of pages, the key is the ability to delete a folder. Note that this is only useful if all the pages you want to remove are in the same folder(s), and there is no content that you want to keep in those specific folders.

Combining The URL Removal Tool With The Basic Tools

Google’s URL Removal Tool only removes the content from their index for 90 days, so it is not permanent. It is important, therefore, that you take additional steps to make sure that content does not come back into the index. You need to combine its use with one of the Basic Tools discussed above. Here is a table that represents how I look at the choices:

Tactic When to Use
URL Removal Tool, Plus Deleting Pages and All Links to them, Plus 301s to Best Fit Pages Always the best choice if there is no need for the pages to exist and if you are able to eliminate the pages.
URL Removal Tool Plus Rel=Canonical Tagging The best remaining choice if preserving PageRank is a priority; however, you can only use this when your pages are a true duplicate or a strict subset of the pages that the tags point to.
URL Removal Tool Plus NoIndex Tag Use when preserving PageRank is a priority, but the Rel=Canonical tag is not appropriate.
URL Removal Tool Plus DisAllow in Robots.txt Use when reducing the number of pages that the search engines have to crawl is the priority.

Summary

The first and most important step is to be paranoid when using the URL Removal Tools. You can seriously hurt your business if you do it incorrectly. Whoever does this for you needs to be very deliberate and precise — someone that you would trust the fate of your business to.

Again, the URL removal tools only help with large blocks of pages if those pages all exist within one folder or a small set of folders that do not also contain content you want to keep in the index.

But, if you have a very large number of pages that need to be removed, and it needs to happen fast, a URL removal tool can potentially be quite helpful. Just combine it with the right Basic Tool to ensure that the pages you are trying to remove stay out of the index for good.

Opinions expressed in the article are those of the guest author and not necessarily Search Engine Land.

Related Topics: All Things SEO Column | Channel: SEO | How To: SEO | SEO: Domain Names & URLs | SEO: Duplicate Content | SEO: Tagging

Sponsored


About The Author: is the president of Stone Temple Consulting, an SEO consultancy outside of Boston. Eric publishes a highly respected interview series and can be followed on Twitter at @stonetemple.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.
  • victorpan

    Thanks, this post was very helpful in understanding the big picture of indexing. Fortunately I haven’t had to use the URL removal tool, but it’s nice to put things into perspective and scope.

  • http://www.richamorindonesia.com/ Rich Amor Indonesia

    Thank you very much. It help me to understand what should I do with my website after a slight change on my traffics.

  • Eve

    I have a question: What about internal searches that are already indexed? I have used robots.txt with diallow but as you’ve stated “To be clear, adding a page or a folder to the “Disallow” list in your Robots.txt file does not remove it from the index”, so should I use GWT for removal?

  • Pano Kondoyiannis

    How can remove articles who don’t have access to them?

  • http://www.brickmarketing.com/ Nick Stamoulis

    “The first and most important step is to be paranoid when using the URL Removal Tools”

    Be triple extra careful before you start trying to de-index various pages on your site. I like to map out every single page, whether I am doing something to them or not, so I know exactly what my plan of attack is. That way, if something does get messed up, I can go back and look and see what I mean to do.

  • http://in.linkedin.com/pub/ashvini-vyas/37/474/451/ Ash Vyas

    using GWT tool’s disavow link feature.

  • http://in.linkedin.com/pub/ashvini-vyas/37/474/451/ Ash Vyas

    have your website got crawled after putting it in robots? id not, then wait till then. If yes then, include noindex descriptive in that page

  • http://www.archology.com/ Jenny Halasz

    Nice article Eric! One other thought for your readers… DON’T use rel=canonical on pages that you want Google to ignore. For example, if you have bad inbound links and you rel canonical them, you could really mess yourself up because you’re taking bad links and moving them to another location on the site. In that case, a robots noindex and a 301 redirect are probably your best bet.

 

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide