Google Releases Improved Content Removal Tools

Google has rolled out new tools to help people quickly get content removed from its search engine. Those targeted at site owners allow for speedy removal of pages and cached copies of pages. Other tools allow those to request the removal of images or links to pages with personal information about themselves, in the right […]

Chat with SearchBot

Google

has rolled out
new tools to help people quickly get content removed from its
search engine. Those targeted at site owners allow for speedy removal of pages
and cached copies of pages. Other tools allow those to request the removal of
images or links to pages with personal information about themselves, in the
right circumstances. More on the tools and various options are covered below.

Site Owner Removal Options

For site owners, the best way to keep content out of Google is by using the
robots.txt or
meta robots tag
options. Either option can prevent pages from getting into Google or get them
removed once included. However, getting pages removed once in can take time. You
have to wait for Google to revisit the pages you’ve flagged for removal, a
process that can take days or longer.

The new site owner tools can be found within
Google Webmaster Central, for
those with verified accounts (That’s explained more

here
, and it’s free and easy to do). Once logged in, select the site you
want to remove pages from via the "Dashboard" screen. When that site loads,
choose the "Diagnostics" tab, then select the "URL Removals" link you’ll see in
the left-hand navigation.

That will load a screen up with four options, allowing you to remove:

  • Individual URLs (a particular page, image or anything with a
    specific URL that’s listed in Google)
     
  • Directories: (all pages within particular sections of your site,
    such as within the /about/ area)
     
  • Entire Site: (want to wipe out your entire site? Go ahead!)
     
  • Cached Copy: (want a page to be listed, but not have a copy of it
    cached anymore?)

Removing URLs

To remove individual URL, directories or your entire site with the new tools,
you must block crawling of these using either the robots.txt or meta robots tag
options. Alternatively, if the page, pages or entire site are physically gone
from the internet — returning 404 "not found" or 410 "gone"
error codes
— then the tools can also process the request.

To remove a URL, you enter that URL. Up to 100 can be entered at a time using
the form (if you want do more than this, submit the first 100, then start again
with a fresh form). To delete directories or entire web sites, you enter the
directory path or the web site address using separate forms.

After submitting a request, the deletion will go into a processing queue. You
can monitor the status of any request using the "Current Requests" tab of the
URL Removals screen. Requests in progress are flagged as "Pending." Those
removed get flagged "Removed" and appear on the "Removed Content" tab. If
there’s a problem, a "Denied" message appears, with a link to explain more about
what problem needs to be corrected.

How long to process a request? The tool should act on any valid requests
within 3 to 5 days or faster.

How long will removals last? For six months, once processed — and regardless
of whatever you do on your web site during that time, unless you specifically
ask for reinclusion.

For example, say you remove a page from your web site, then ask for the page
to be removed from Google using the removal tool. Two weeks later, you put the
page back up. Google will still continue to follow the original instructions,
not to include the page, even though it exists.

During the six month period, you can rescind a removal request. Simply find
any removal action you’ve done listed on the Removed Content tab, then select
the "Reinclude" option that should show.

After the six month period, Google will resume including or excluding content
as normal — IE, looking to see if you have a robots.txt or meta robots tag
barrier in place, to prevent valid pages from getting in. If you want pages kept
permanently out, don’t put them back online without the proper restrictions in
place!

Removing Cached Pages

By default, Google listings have a link to the actual web page as well as a
cached copy of the page.
Cached pages are
where Google will show a searcher a copy of the page that Google saw without the
searcher having to go to the actual web site. This is handy for searchers in
cases where a page might no longer exist. However, site owners might not want
these cached copies to exist at Google.

The meta robots tag
provides options to keep cached pages out, but the new tools give you speedier
access for removal. As with removing URLs, the tools at Webmaster Central will
get rid of a cached copy within 3 to 5 days.

To process your request, Google needs to see that a meta robots tag set to
"noarchive" is now on the page (see
Meta Robots Tag 101:
Blocking Spiders, Cached Pages & More
for more about this). Put that tag on
the page, push submit, and you’re set. Well, you will be set from around 3pm
Pacific time from April 18 onward. There’s a bug still being worked out for this
part of the new toolset.

What if you can’t put the tag on a page? I’ll explain more how this works in
the third-party section below.

The cached page will be kept out for six months. You can ask for the cached
copy to be reincluded sooner than this, if you want. However, make sure Google
has actually revisited the page since you altered it. Unfortunately, this means
watching your logs. To be safe, you’re probably better off not asking for the
reinclusion before the six months have expired.

Want to keep the page or any pages from being cached permanently? Again, use
the meta robots tag.

Finally, keep in mind that using the removed cached pages option will also
remove any description of the page in the listings. In contrast, the meta robots
tag gives you the ability to remove just the cached page OR the description OR
both, if you choose.

URL Removals Options: At-A-Glance

I’ve written earlier about a similar Yahoo tool for removing URLs (Up
Close With Yahoo’s New Delete URL Feature
) plus options with all the major
search engines to remove page descriptions and cached copies (Meta
Robots Tag 101: Blocking Spiders, Cached Pages & More
). Below is an
at-a-glance chart I’ve used with both those previous articles, now updated to
add in the Google options.


System


Robots.
txt

Meta

Robots

Yahoo

Delete
URL

Google Delete URL

Stops Crawling

Yes
No No No

Stops Index Inclusion

Yes

Yes

Yes

Yes

Stops Link Only Listing
No No
(Yes,
for Google)

Yes

Yes

Why Use?

Easy to block many pages at once

Can’t access root domain

Don’t even want URL to appear or need page out fast

Don’t even want URL to appear or need page out fast

Explanations:

  • Stops Crawling: If "Yes," the page won’t be spidered at all. If
    "No," the page might get spidered, but it will not be included in listings.
     
  • Stops Index Inclusion: URLs will not show up in response to
    searches.
     
  • Stops Link Only Listings: This is where a page is listed with only
    a title and URL. Yahoo
    calls
    these "thin" listings; Google

    calls
    them "partially indexed".

Third Party Removal Options

What if you want content removed from Google on pages you do NOT control or
maintain yourself? There’s a special
removals tool you
can use, as long as you have a Google
Account
of any type.

What can you remove? Not a lot using the new tool, if you haven’t worked with
the site owner themselves.

Third Party Page Removal

Let’s say there’s some page (or image) you don’t like on a web site. You’ve
contacted the site owner, and they’ve agreed to pull down the content.
Unfortunately, you still see it showing up in Google’s listings. Ideally, the
site owner could log into Google Webmaster Central, use the site owner tools
I’ve covered, and get the page removed. But they don’t want to do this.

The third party tool lets you do it for them, or for any page that’s no
longer live on the web or now banned from crawling using robots.txt or the meta
robots tag. You simply enter the URL of the page in question and submit. If it’s
a valid request (again, the page is no longer live or being blocked from
crawling), it will be removed in 3 to 5 days. You can also log in to see the
status of your request.

Site owners — don’t freak out over this! Someone can’t remove your pages
from Google unless you actually take them off the web or prevent blocking. This
simply speeds up the removal process.

In fact, the ability for a third party to trigger a change isn’t new.
Google’s long had an
automatic URL removal tool
that anyone could use to trigger page removals.
In fact, when WebmasterWorld
blocked spiders from hitting the site back in November 2005,
several people used
that tool to get pages removed faster than Google would have done following its
usual schedule.

That tool remains for the moment, but Google says to use the new tools for
faster processing and better reporting.

Third Party Cache Removal

What if the page remains but just part of it has changed — and you want
Google’s cached copy to reflect this? There’s an option for that, too.

For example, say you’re Joe of Joe’s Diner. Someone reviewed your fine eatery
and wrote a three word review: "Joe’s Diner Sucks." This review upsets you. You
contact the site owner, and they agree to remove it from the page.
Unfortunately, it can still be seen by anyone who looks at Google’s cached copy.
You have to wait until whenever Google gets around to refreshing its copy of the
page for that review to go away (which could take awhile, see
Squeezing The Search
Loaf: Finding Search Engine Freshness & Crawl Dates
for more on that).

As explained, the site owner could help by getting the cached page removed
entirely. But if the site owner doesn’t want to do that, you can use the
third-party tool to make it happen.

First, check to see if at least the site owner has at least put on the
required meta robots tag to prevent caching. If so, submit the page, and the
request will be processed.

No tag? Here’s the alternative. Submit the URL, then find some of the words
that have been removed (such as "diner sucks"). Enter these words into the
"Term(s) that have been removed from the page" box of the Cached page removal
form. Submit, and Google checks the page. It sees the words are gone, it knows
the page has changed and processes the request to remove the cached copy.

Site owners — DO feel free to freak out over this! You should.

To be clear, anyone can wipe out your cached pages in Google for up to six
months using this third party tool even if you have NOT yourself used the
required meta tag.

No big deal? Who cares about cached pages being gone? Remember, it’s not just
that your cached page will go. The description for your listing will disappear,
too.

Frankly, I don’t think Google should have launched this feature this way. I
think it is ripe for abuse.

For its part, Google says the old tool actually operated the same way for
years and has never been abused. IE, the feature isn’t new, it’s just getting
new attention as part of the new toolsets. Google says it will watch more
closely to prevent abuse such as I’ve outlined. The company’s official
statement:

Google’s always encouraged consumers to work directly with a site’s
webmaster when they have concerns about content in our search results. When
the webmaster has removed or changed information on the live page, but that
information still exists in our cached copy, we’ve worked with consumers to
review and help expedite the removal of outdated cached copies appropriately.
Where consumers previously reported these outdated cached copies via online
contact forms, they can now do so via the tool. The same precautions and
considerations are still observed; with the launch of the new tool, the means
by which a consumer can report an outdated cached copy has changed. We’ll
monitor requests through the new tool and make adjustments as necessary.

Personal Info Removal

What if the site owner won’t remove or modify a page. Then you can get the
page removed if shows or contains:

  • Your social security or government ID number
  • Your bank account or credit card number
  • An image of your signature
  • Explicit content which violates

    Google’s guidelines
    and contains my personal information.

What’s that last one all about? Well, say someone scrapes your name or
business name and shoves it onto a page of porn content. The porn’s the
"explicit" part and your personal information — well, that’s your name, Google
says. Google will act to get rid of that page, and no one will be the sadder for
it.

Other Removals

That third party
removals tool
also provides options for anyone to delete dead links in
Google or report pages or images that have slipped past the SafeSearch
adult-content filter.

In addition to that,

this page
at Google lists other types of content removals you can request,
such as listings in Google Blog Search or transcoded pages in Google Mobile.
Google’s DMCA page also covers how
to remove content that might be violating your copyright.


Opinions expressed in this article are those of the guest author and not necessarily Search Engine Land. Staff authors are listed here.


About the author

Danny Sullivan
Contributor
Danny Sullivan was a journalist and analyst who covered the digital and search marketing space from 1996 through 2017. He was also a cofounder of Third Door Media, which publishes Search Engine Land and MarTech, and produces the SMX: Search Marketing Expo and MarTech events. He retired from journalism and Third Door Media in June 2017. You can learn more about him on his personal site & blog He can also be found on Facebook and Twitter.

Get the must-read newsletter for search marketers.