Subscribe Via Web Feed Subscribe with Google Add to My Yahoo! Subscribe with Bloglines Add to netvibes Subscribe with Live.com

« Social Media Not Converting? Put Down The Shotgun & Use A Rifle! | Main | Happy Birthday, Flickr: Web 2.0 Pioneer Turns Three »

Feb. 27, 2007 at 3:47pm Eastern by Danny Sullivan

Squeezing The Search Loaf: Finding Search Engine Freshness & Crawl Dates

A reader emailed me today noticing that Google was showing a date next to his listing, which made me think this was a good time to revisit how, when and where search engines show crawl dates for pages. These dates are a useful way for site owners to understand how often they are being revisited or for anyone to "squeeze the loaf" of a search engine to see how fresh it is. Here's a search engine-by-search engine rundown on date display. I'll also cover how we've sadly lost crawl dates being embedded next to listings, over the years. But that's not all! Read now and you'll even get a free at-a-glance table explaining how dates are displayed. Read now -- web server operators are standing by!

Google

When you do a search, some pages may show a date below the description of a listing, as illustrated below:

Crawl Dates At Google

I thought Google had long done this for certain pages that it revisits on a super-frequent basis. And when I did a search for cars today, I saw a date like this coming up for the cars.com listing as shown above. An hour later, the date was gone. I then tried that search again using a particular Google data center, rather than whatever data center was assigned to my browser randomly. Doing the same search at that data center gave me dates again.

I'm checking with Google on how long dates have been showing and why they may come and go as I saw today. I'll postscript what I'm told at the end of this story.

The example above shows that only some pages have dates. In contrast, the Google Cache can give you dates for nearly any web page.

The Google Cache allows you to view a copy of a page that is stored on Google's servers, rather from the website directly. (Don't like Google caching this for your site? Learn how to prevent it here and here. Don't see a cached link option? Then the site owner is blocking caching).

Going back to our search for cars and the screenshot above, you'll see that the disney.go.com listing doesn't have a date next to it. To find the date the page was visited, you have to click on the link that says "Cached" under the description of that listing. That makes the cached page load like this. At the top of that page, you'll see this:

This is Google's cache of http://disney.go.com/disneypictures/cars/ as retrieved on 22 Feb 2007 14:34:08 GMT.

See the date and time, which I've put in bold? That's when the page was last visited by Google.

FYI, before September 2006, that date reflected the last time Google found the page to have changed, not when it was last visited. In other words, if Google visited the page in January 2005, then revisited it throughout the year but the page never changed, the cached date would keep saying January 2005.

Since September 2006, that's been different. The date was altered to reflect the last time Google visited the page -- a good change to make. Google explains more about this on the Google Webmaster Central blog here, and Google's Matt Cutts also did a video about it here.

The options above allow anyone to see the freshness of any pages within Google, one page at a time (as long as they are cached). What if you want to get industrial strength and view the freshness of all your own pages at once? Unfortunately, the Google Webmaster Central tools don't let you see the last time all your pages were spidered. But that's something they're considering for the future. The tools will, however, show you any problems Google had in reaching any of your pages and the last time a crawl error happened for those pages. Using the "Crawl rate" option found under the Diagnostics tab, you can also see a general graph of crawling activity to your site.

There is one other type of date that you might see associated with listings that has nothing to do when the page was crawled. Look here:

Google Personalized Search Last Visit Date

See the "3 visits - Feb 14" part? That's coming from Google Personalized Search and shows that I've clicked on that listing 3 times, with the last visit being on Feb. 14. My Google Ramps Up Personalized Search article from earlier this month explains more about how Google Personalized Search works and can be disabled, if you don't like it on, as now happens much more often.

Microsoft Live Search

Microsoft Live Search operates like Google. Some pages show dates next to them, as I've highlighted below:

Crawl Date At Microsoft Windows Live

As with Google, this seems to happen with pages that are being spidered frequently, but I'll check on this. Does a page lack a date? Then click on the "cached page" link. When the cached page loads, you'll see something like this at the top of it:

This is a version of http://www.pixar.com/theater/trailers/cars/index.html as it looked when our crawler examined the site on 2/16/2007. The page you see below is the version in our index that was used to rank this page in the results to your recent query.

The date (which I've but in bold above) tells you when the page was last spidered.

Don't see a cached page option? The site owner is probably blocking caching. Are you a site owner that wants to block caching? Visit the help area at Live and search for "cache" to find more info. I'd point you to the right place, but it remains impossible to link to particular pages in Microsoft's absurd help system.

[Postscript: Microsoft sent this information: "We only show the last-crawl date when it is within a few days. This is a decision to draw attention to the freshest content without highlighting older content. Crawl dates for other documents can be found by looking at the cached page."]

Ask.com

At Ask.com, you can only get dates by looking at the cached pages, similar to how that works at Google and Microsoft. Click on the "Cached" link that you'll see next to the URL of a listing, as highlighted below:

Crawl Date At Ask.com

At the top of the page, you'll see something like this with the date and time (shown in bold below) that the page was last visited:

Below is a cache or saved snapshot of  http://www.cars.com/  as we found it on February 19, 2007 1:24:56 AM.

Yahoo

At Yahoo, you can only get dates one way, through using Yahoo Site Explorer. You'll have to create an account for your web site, then authenticate your account, then you'll be shown last crawl dates as I've highlighted in the first listing below:

Crawl Date At Yahoo Site Explorer

More than any other search engine, Yahoo makes it easy for a site owner to see the freshness of many pages all at once. However, the huge disadvantage from a searcher perspective is that you can't spot check the freshness of any page you randomly select.

The Date & Freshness Table

I love nothing more than doing tables, so let's put everything above into a nice one:

Feature Ask Google Microsoft Yahoo
Dates Next
To Listings?
No Some Some No
Dates On
Cached Pages?
Yes Yes Yes No
Dates In
Webmaster Tools?
No
Tools
For Errors & Home Page No
Tools
Yes

Ideally, I'd like to see that top row -- "Dates Next To Listings?" -- be completely "Yes." Some site owners block caching, which makes it hard to measure freshness. Putting the dates right next to the listings makes it easy for anyone who cares to see at a glance if a search engine is stale or fresh.

In fact, I have to laugh. I've been asking for this for years. On the old features chart I used to maintain about dates, I wrote in 2001:

Along with the page description, some search engines show the date when a web page was created or modified. As noted above, these dates may not always be reliable. However, they do provide a useful clue as to how fresh or stale a search engine's listings are. Thus, search engines that show a date deserve praise for doing so.

That was from 2001! Nearly six years later, it's still the case that dates aren't being shown. In fact, it's a reversal. Back in 2001, the major search engines of AltaVista, HotBot (Inktomi) and Northern Light all showed dates for all listings right within search results. Fast forward to today, and none of the major search engines do.

The reason is simple enough. Over time, the search engines either couldn't maintain freshness or didn't want to show they were sometimes stale. So dates either went away or never got added. C'mon gang -- time to bring them back right into the search results. If they aren't there by default, make it an option people can enable.

Verifying Freshness

In the meantime, there's a favorite tactic for those search watchers who want to track freshness. Google's Matt Cutts once wrote about this back in 2005, describing exactly a technique I and others have long used. You simply find a page that you know carries a date that's constantly updated. Look at the cached page and see what the time and date says on it.

But Yahoo doesn't show a date on cached pages! No, it doesn't, but you're not looking for the date that the search engine inserts. You want the date on the page itself. For example, here's the cached page over at Yahoo for CNN:

Finding Dates On Cached Pages

See the part I highlighted in red, that says:

UPDATED: 3:53 a.m. EST, February 26, 2007

That's the date that CNN had on its own page when the Yahoo spider last visited. When I looked, the date and time was 3:10 pm EST on February 27 -- so the page is only 12 hours old. Not bad in this case, but I wouldn't expect a major news site to be much out of date.

Return Of The Freshness Guarantee?

Finally, I'll leave you with this trip down memory lane. Back in June 1999, AltaVista once offered a freshness guarantee that was quickly broken. As I wrote at the time:

"AltaVista search is able to make its Freshness Guarantee: no search site will have fresher results than AltaVista."

AltaVista unveiled its first "Freshness Guarantee" back when it relaunched in June, promising that its entire index would be refreshed at least once per month. That guarantee was almost immediately broken, as even AltaVista President Rod Schrock admitted when we talked recently. "We turned our attention to this new system," Schrock said.

OK, fair enough -- they wanted to build something even better. But this new guarantee has already been broken, as described above. If claims like these are going to be made, then they should actually be met. And not to meet them in the midst of a huge media blitz is an incredible blunder.

Freshness is one important component to what makes a good search engine. It's not the only thing. Having fresh results means nothing if the results aren't relevant. And some pages don't need to be spidered that often. But putting dates next to listings is an easy form of search "food" labeling that can give reassurance about a major search engines. Surely it's time for dates to make a comeback.

Like The Story? Vote For It On Yahoo Buzz!
Subscribe To Our Daily Search News Recap!
Your Email:
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds)
Subscribe To Our Search Feed!
Subscribe Via Web FeedSubscribe with GoogleAdd to My Yahoo!Subscribe with BloglinesAdd to netvibes
Subscribe with Live.comSubscribe in NewsGator OnlineSubscribe in RojoAdd to My AOL
Share & Bookmark This Story!
By Danny Sullivan Permalink Jump To Comments See Related Stories In: Ask: SEO, Ask: Web Search, Google: SEO, Microsoft: Live Search, Microsoft: Live Search SEO, SEO: Blocking Spiders, SEO: General, SEO: Titles & Descriptions, Search Features: Dates, Stats: Freshness, Yahoo: SEO, Yahoo: Site Explorer



Reader Comments

I'm less concerned about showing the crawl dates on the search results. I would love if Yahoo would start showing the crawl date on their cached pages, because that would allow apples-to-apples comparisons.

Hi, thanks for the interesting article.

I recently launched a a web site on the American folk song Follow the Drinking Gourd and have been tracking home page and site-wide caching information ever since. Bottom line: Since March 1st, Google and Yahoo have lagged on average about half a week behind on the home page, while MSN is running a week or so behind.

The really interesting differences in caching show up in how frequently the search engines are refreshing the other pages in this site. Aside from the home page, the average followthedrinkinggourd.org page had a whopping cache lag of 39 days in Google, vs. 13 days in Yahoo and just 10 in MSN.

Follow the Drinking Gourd: Site Caching statistics

I hope the information proves useful.

All the best,

Joel B

I always factor in how often a website is updated, for me this far outweighs pagerank

Search:

Search Marketing Expo

Save the date for:
SMX Madrid (in Spanish, May 20-21)
SMX Advanced - Seattle, WA (June 3-4) Register today! Early bird rate expires May 9!
SMX Local & Mobile - San Francisco, CA (July 24-25) (July 24-25) Pre-agenda rate expires May 2. Get the lowest rate by registering now.
SMX East - NYC - (Oct. 6-8)
SMX London - November 4 & 5, 2008

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts:

Most Recent News Posts

About Search Engine Land

Stay Updated!

Get Our Search Newsletters:
Email:
Daily Monthly

Get Our Search Feed:
Subscribe Via Web FeedSubscribe with Google
Add to My Yahoo!Subscribe with Bloglines
Add to netvibesSubscribe with Live.com
Subscribe in NewsGator OnlineSubscribe in Rojo
Add to My AOL
More About Our Feeds & Newsletters

Add to Technorati Favorites

Track Us Socially:
Facebook: Our Search News App
Facebook: Search Engine Land Page
Facebook: Search Engine Land Group
Flickr: Search Engine Land
LinkedIn: Search Engine Land Group
Twitter: Search Engine Land Feed

Bragroll