Up Close With Yahoo’s New Delete URL Feature

Since Yahoo rolled out a new Delete URL feature this week, a number of questions have come up on how exactly it works. I had time yesterday with some of the Yahoo Site Explorer team to gather answers. Thanks to Priyank Garg and Amit Kumar, who along with Tim Mayer, went through the inner workings.

It’s probably most important to understand the difference between how pages have traditionally been kept out of Yahoo versus what Delete URL does. Traditionally, Yahoo is told not to spider pages at all using either a robots.txt file or a meta robots tag that uses the "noindex" setting. Here’s some more about how those options work, versus the new Delete URL feature:

  • Robots.txt: Yahoo checks your robots.txt file on a regular basis to see what pages it is forbidden from crawling. Block a page using robots.txt, and Yahoo will stop visiting that page. If the page isn’t crawled, then it doesn’t appear within the index or gets dropped if it was previously listed. Remove the block from robots.txt, and Yahoo will start crawling the page again, causing it to return to the listings.
     
  • Meta Robots (set to NOINDEX): If Yahoo isn’t blocked by robots.txt from crawling a page, then it looks on the page itself to see if there’s a meta robots tag in place. If so — and if that tag is set to noindex — then the page will not be added to the listings or dropped if it is already in the index. It will continue to get crawled! Meta robots does not block crawling. However, it will not be included in the index as long as the meta robots tag continues to say noindex.
     
  • Delete URL: Delete URL works independently of the other two options. Use it, and pages will continue to be crawled. However, similar to the meta robots tag using noindex, they won’t get indexed.

The chart below provides some further at-a-glance guidance on what to use and how each blocking feature operates:

System Robots.
txt
Meta Robots Delete
URL
Stops Crawling Yes No No
Stops Index Inclusion Yes Yes Yes
Stops Link Only Listing No No Yes
Why Use? Easy to block many pages at once Can’t access root domain Don’t even want URL to appear or need page out fast

To expand a bit on the chart, some people don’t want the major search engines to spider certain pages in order to reduce bandwidth load. That means blocking crawling. Only robots.txt will do this for you. It also will keep the pages out of the index.

Unfortunately, robots.txt will only work at the root level of a domain. IE, it has to be at domain.com/robots.txt rather than domain.com/subarea/robots.txt. Some people have their web sites deep within other domains, so the meta robots tag (using noindex — and in all future references, I mean meta robots using the noindex setting) is a way to keep pages out. The pages will continue to be crawled, but they won’t show up.

With both robots.txt and meta robots, it’s still possible that a URL will appear in the listings. This is because Yahoo will still list a URL because it knows of other people linking to it. For example, perhaps you have some confidential report you put online. You might prevent Yahoo from including the report by crawling or indexing the content. However, if other people are linking to it, then the report might still come up. Yahoo won’t know about anything inside of it, but sometimes just links alone can make a page relevant for terms.

Yahoo’s calls these "thin" listings (Google calls them "partially indexed"). If you use Delete URL, you can remove all traces of the URL from Yahoo search results. Even thin URLs will be gone.

Delete URL is also potentially faster than using robots.txt or meta robots. Both of those depend on Yahoo revisiting the site, seeing the restriction and acting on it. It might take Yahoo several days or longer to get back to some sites. Delete URL tells Yahoo to speed up the process. It acts as a virtual meta robots tag, and Yahoo says pages should be removed in 24 to 48 hours.

The virtual meta robots tag concept is important. No, you do not have to have an actual meta robots tag set to noindex on the pages you want to remove. Nor do you need to have a robots.txt file blocking pages. Delete URL will work instead of either of these to keep pages out. It will also work in addition to them.

For extra security, it might be nice if Delete URL only worked if people ALSO had one of the traditional methods in place. But I understand Yahoo’s view that they want a third alternative to work for those who can’t use the other two systems.

After the feature came out, Andy Beal over at Marketing Pilgrim had the fear-inspiring headline of Yahoo Delete URL Feature Disaster Waiting to Happen. He wrote:

It is literally a disaster waiting to happen. There is zero verification other than being logged into the proper Yahoo account to delete an entire site from the Yahoo index.

With Google you are required to upload a robots.txt file to the webserver that verifies the same information being requested through the Google delete URL/Site tool. With Yahoo, you just log in, click delete, click confirm, and it’s gone.

Until they fix this issue I recommend to everyone that you don’t authenticate any domain to Yahoo Site Explorer and if you have previously authenticated a site that you remove the authentication file or meta tag.

Well gosh, then you might as well not have a robots.txt file on your domain. I mean, it’s a disaster waiting to happen. All you need is for someone to figure out your username and password to your site, install that puppy and out goes your site.

I like Andy, so I’m poking at him in good fun. But I do think we need some perspective. Let’s say Andy does authenticate his site with Yahoo. Now I’ve got to figure out what his Yahoo username and password is for that particular site. Is he andy_beal? andybeal45? marketingpilgrim? andyexpat342? Just knowing what username he might use with that site is the first challenge. Then I’ve got to guess the password.

If I do guess the password and get in, bam! Site wiped out! Not really. First, the URLs will go into a processing queue, and that’s going to take up to 24 hours to happen. Look, here I deleted a page from my site yesterday, about 12 hours ago:

Yahoo Site Explorer: Delete URL Actions Page

As you can see, the status is "Pending Delete" — the URL has yet to get removed. I still have time to prevent it from happening.

Let’s say pages do get wiped out. They’re actually still in the index. Delete URL simply suppresses them from appearing. This means Yahoo can quickly get them back in 1 to 2 days, if need be (though for some rare "low priority" URLs, Yahoo says this might take up to a month).

Of course, I can understand the concern here. There are two other things that might help. First, perhaps site owners who are really worried could set up a special authentication password or PIN to use to authorize a delete. So if someone did get both your username and password, perhaps the delete can’t happen unless they also know your PIN. Second, perhaps an RSS feed or email notice could go out to keep the account holder altered to any major pending action. For its part, Yahoo says they are considering additional safeguards.

Another issue that’s come up is that you can only do up to five active deletes per site at a time. In other words, you can do five delete actions. When those are processed, you can then do more. This is Yahoo being conservative, so the limit might get raised in the future. But five deletes is not the same as five pages. You can delete many more pages than that.

If you delete a root URL like this:

http://domain.com

Then all pages below that domain will get removed, such as:

http://domain.com/subarea1/page1.html

http://domain.com/subarea2/page45.html

One delete — but many, many pages gone. You can also delete all pages in a particular directory or subarea of your site. So find a page like this:

http://domain.com/subarea1/

And all pages in the /subarea1/ section will go.

Keep in mind that while removal is fast, you could still be looking at two to three days in some cases. It takes up to 24 hours for authentication to be verified, though Yahoo says this may happen much sooner (for me, it took several hours yesterday). After that, you’re looking at 24 to 48 hours for most pages to go.

If it’s a real emergency with a legal component, such as copyrighted material that should be pulled under a DMCA action, Yahoo has instructions on that here.

Finally, more than one person can authenticate to manage your site. Want to keep tabs on them? Anyone authenticated for a site will see all Delete URL actions done by anyone else authenticated for that particular site.

What if you have an employee that establishes authentication then goes bad after they are fired. As long as you remove their unique authentication code from your web server, they can’t hurt you. Any deletion action will check to see that authentication for the person requesting it is in place. Authentication is also checked on a routine basis, as well.

For more on removing material from Yahoo, some key help files to check out:

Related Topics: Channel: SEO | SEO: Blocking Spiders | Yahoo: SEO | Yahoo: Site Explorer

Sponsored


About The Author: is a Founding Editor of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also serves as Chief Content Officer for Third Door Media, which publishes Search Engine Land and produces the SMX: Search Marketing Expo conference series. He has a personal blog called Daggle (and keeps his disclosures page there). He can be found on Facebook, Google + and microblogs on Twitter as @dannysullivan.

Connect with the author via: Email | Twitter | Google+ | LinkedIn



SearchCap:

Get all the top search stories emailed daily!  

Share

Other ways to share:
 

Read before commenting! We welcome constructive comments and allow any that meet our common sense criteria. This means being respectful and polite to others. It means providing helpful information that contributes to a story or discussion. It means leaving links only that substantially add further to a discussion. Comments using foul language, being disrespectful to others or otherwise violating what we believe are common sense standards of discussion will be deleted. Comments may also be removed if they are posted from anonymous accounts. You can read more about our comments policy here.

Comments are closed.

Get Our News, Everywhere!

Daily Email:

Follow Search Engine Land on Twitter @sengineland Like Search Engine Land on Facebook Follow Search Engine Land on Google+ Get the Search Engine Land Feed Connect with Search Engine Land on LinkedIn Check out our Tumblr! See us on Pinterest

 
 

Click to watch SMX conference video

Join us at one of our SMX or MarTech events:

United States

Europe

Australia & China

Learn more about: SMX | MarTech


Free Daily Search News Recap!

SearchCap is a once-per-day newsletter update - sign up below and get the news delivered to you!

 


 

Search Engine Land Periodic Table of SEO Success Factors

Get Your Copy
Read The Full SEO Guide