Feb 1, 2007 at 8:37am ET by Danny Sullivan
Since Yahoo rolled out a new Delete URL feature this week, a number of questions have come up on how exactly it works. I had time yesterday with some of the Yahoo Site Explorer team to gather answers. Thanks to Priyank Garg and Amit Kumar, who along with Tim Mayer, went through the inner workings.
It’s probably most important to understand the difference between how pages have traditionally been kept out of Yahoo versus what Delete URL does. Traditionally, Yahoo is told not to spider pages at all using either a robots.txt file or a meta robots tag that uses the "noindex" setting. Here’s some more about how those options work, versus the new Delete URL feature:
The chart below provides some further at-a-glance guidance on what to use and how each blocking feature operates:
| System |
Robots. txt |
Meta Robots |
Delete URL |
| Stops Crawling | Yes | No | No |
| Stops Index Inclusion | Yes | Yes | Yes |
| Stops Link Only Listing | No | No | Yes |
| Why Use? | Easy to block many pages at once | Can’t access root domain | Don’t even want URL to appear or need page out fast |
To expand a bit on the chart, some people don’t want the major search engines to spider certain pages in order to reduce bandwidth load. That means blocking crawling. Only robots.txt will do this for you. It also will keep the pages out of the index.
Unfortunately, robots.txt will only work at the root level of a domain. IE, it has to be at domain.com/robots.txt rather than domain.com/subarea/robots.txt. Some people have their web sites deep within other domains, so the meta robots tag (using noindex — and in all future references, I mean meta robots using the noindex setting) is a way to keep pages out. The pages will continue to be crawled, but they won’t show up.
With both robots.txt and meta robots, it’s still possible that a URL will appear in the listings. This is because Yahoo will still list a URL because it knows of other people linking to it. For example, perhaps you have some confidential report you put online. You might prevent Yahoo from including the report by crawling or indexing the content. However, if other people are linking to it, then the report might still come up. Yahoo won’t know about anything inside of it, but sometimes just links alone can make a page relevant for terms.
Yahoo’s calls these "thin" listings (Google calls them "partially indexed"). If you use Delete URL, you can remove all traces of the URL from Yahoo search results. Even thin URLs will be gone.
Delete URL is also potentially faster than using robots.txt or meta robots. Both of those depend on Yahoo revisiting the site, seeing the restriction and acting on it. It might take Yahoo several days or longer to get back to some sites. Delete URL tells Yahoo to speed up the process. It acts as a virtual meta robots tag, and Yahoo says pages should be removed in 24 to 48 hours.
The virtual meta robots tag concept is important. No, you do not have to have an actual meta robots tag set to noindex on the pages you want to remove. Nor do you need to have a robots.txt file blocking pages. Delete URL will work instead of either of these to keep pages out. It will also work in addition to them.
For extra security, it might be nice if Delete URL only worked if people ALSO had one of the traditional methods in place. But I understand Yahoo’s view that they want a third alternative to work for those who can’t use the other two systems.
After the feature came out, Andy Beal over at Marketing Pilgrim had the fear-inspiring headline of Yahoo Delete URL Feature Disaster Waiting to Happen. He wrote:
It is literally a disaster waiting to happen. There is zero verification other than being logged into the proper Yahoo account to delete an entire site from the Yahoo index.
With Google you are required to upload a robots.txt file to the webserver that verifies the same information being requested through the Google delete URL/Site tool. With Yahoo, you just log in, click delete, click confirm, and it’s gone.
Until they fix this issue I recommend to everyone that you don’t authenticate any domain to Yahoo Site Explorer and if you have previously authenticated a site that you remove the authentication file or meta tag.
Well gosh, then you might as well not have a robots.txt file on your domain. I mean, it’s a disaster waiting to happen. All you need is for someone to figure out your username and password to your site, install that puppy and out goes your site.
I like Andy, so I’m poking at him in good fun. But I do think we need some perspective. Let’s say Andy does authenticate his site with Yahoo. Now I’ve got to figure out what his Yahoo username and password is for that particular site. Is he andy_beal? andybeal45? marketingpilgrim? andyexpat342? Just knowing what username he might use with that site is the first challenge. Then I’ve got to guess the password.
If I do guess the password and get in, bam! Site wiped out! Not really. First, the URLs will go into a processing queue, and that’s going to take up to 24 hours to happen. Look, here I deleted a page from my site yesterday, about 12 hours ago:
As you can see, the status is "Pending Delete" — the URL has yet to get removed. I still have time to prevent it from happening.
Let’s say pages do get wiped out. They’re actually still in the index. Delete URL simply suppresses them from appearing. This means Yahoo can quickly get them back in 1 to 2 days, if need be (though for some rare "low priority" URLs, Yahoo says this might take up to a month).
Of course, I can understand the concern here. There are two other things that might help. First, perhaps site owners who are really worried could set up a special authentication password or PIN to use to authorize a delete. So if someone did get both your username and password, perhaps the delete can’t happen unless they also know your PIN. Second, perhaps an RSS feed or email notice could go out to keep the account holder altered to any major pending action. For its part, Yahoo says they are considering additional safeguards.
Another issue that’s come up is that you can only do up to five active deletes per site at a time. In other words, you can do five delete actions. When those are processed, you can then do more. This is Yahoo being conservative, so the limit might get raised in the future. But five deletes is not the same as five pages. You can delete many more pages than that.
If you delete a root URL like this:
http://domain.com
Then all pages below that domain will get removed, such as:
http://domain.com/subarea1/page1.html
http://domain.com/subarea2/page45.html
One delete — but many, many pages gone. You can also delete all pages in a particular directory or subarea of your site. So find a page like this:
http://domain.com/subarea1/
And all pages in the /subarea1/ section will go.
Keep in mind that while removal is fast, you could still be looking at two to three days in some cases. It takes up to 24 hours for authentication to be verified, though Yahoo says this may happen much sooner (for me, it took several hours yesterday). After that, you’re looking at 24 to 48 hours for most pages to go.
If it’s a real emergency with a legal component, such as copyrighted material that should be pulled under a DMCA action, Yahoo has instructions on that here.
Finally, more than one person can authenticate to manage your site. Want to keep tabs on them? Anyone authenticated for a site will see all Delete URL actions done by anyone else authenticated for that particular site.
What if you have an employee that establishes authentication then goes bad after they are fired. As long as you remove their unique authentication code from your web server, they can’t hurt you. Any deletion action will check to see that authentication for the person requesting it is in place. Authentication is also checked on a routine basis, as well.
For more on removing material from Yahoo, some key help files to check out:
Share, Bookmark & Discuss This Article
More:
Keep Updated: News Via Email | News Via RSS Feed | News Via Twitter
See more stories like this in the Members Library! Check out the SEO: Blocking Spiders, Yahoo: SEO, Yahoo: Site Explorer sections of the Members Library where this story is filed. Members also get access to exclusive video content, a members-only weekly & monthly newsletter, plus more. Check out all the benefits!
TOP STORIES
SEARCH NEWS BRIEFS
FEATURES & ANALYSIS
RECENT COMMENTS
Stay on top of all the search news with our daily summary, the SearchCap newsletter. View a sample ›
Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.
SMX Web Site » | SMX Difference » | SMX News »
Join us at an upcoming SMX event:
Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:
Featured sites from our Blogroll
Become a premium member today and receive: