Squeezing The Search Loaf: Finding Search Engine Freshness & Crawl Dates


A reader emailed me today noticing that Google was showing a date next to his listing, which made me think this was a good time to revisit how, when and where search engines show crawl dates for pages. These dates are a useful way for site owners to understand how often they are being revisited or for anyone to "squeeze the loaf" of a search engine to see how fresh it is. Here’s a search engine-by-search engine rundown on date display. I’ll also cover how we’ve sadly lost crawl dates being embedded next to listings, over the years. But that’s not all! Read now and you’ll even get a free at-a-glance table explaining how dates are displayed. Read now — web server operators are standing by!

Google

When you do a search, some pages may show a date below the description of a listing, as illustrated below:

Crawl Dates At Google

I thought Google had long done this for certain pages that it revisits on a super-frequent basis. And when I did a search for cars today, I saw a date like this coming up for the cars.com listing as shown above. An hour later, the date was gone. I then tried that search again using a particular Google data center, rather than whatever data center was assigned to my browser randomly. Doing the same search at that data center gave me dates again.

I’m checking with Google on how long dates have been showing and why they may come and go as I saw today. I’ll postscript what I’m told at the end of this story.

The example above shows that only some pages have dates. In contrast, the Google Cache can give you dates for nearly any web page.

The Google Cache allows you to view a copy of a page that is stored on Google’s servers, rather from the website directly. (Don’t like Google caching this for your site? Learn how to prevent it here and here. Don’t see a cached link option? Then the site owner is blocking caching).

Going back to our search for cars and the screenshot above, you’ll see that the disney.go.com listing doesn’t have a date next to it. To find the date the page was visited, you have to click on the link that says "Cached" under the description of that listing. That makes the cached page load like this. At the top of that page, you’ll see this:

This is Google’s cache of http://disney.go.com/disneypictures/cars/ as retrieved on 22 Feb 2007 14:34:08 GMT.

See the date and time, which I’ve put in bold? That’s when the page was last visited by Google.

FYI, before September 2006, that date reflected the last time Google found the page to have changed, not when it was last visited. In other words, if Google visited the page in January 2005, then revisited it throughout the year but the page never changed, the cached date would keep saying January 2005.

Since September 2006, that’s been different. The date was altered to reflect the last time Google visited the page — a good change to make. Google explains more about this on the Google Webmaster Central blog here, and Google’s Matt Cutts also did a video about it here.

The options above allow anyone to see the freshness of any pages within Google, one page at a time (as long as they are cached). What if you want to get industrial strength and view the freshness of all your own pages at once? Unfortunately, the Google Webmaster Central tools don’t let you see the last time all your pages were spidered. But that’s something they’re considering for the future. The tools will, however, show you any problems Google had in reaching any of your pages and the last time a crawl error happened for those pages. Using the "Crawl rate" option found under the Diagnostics tab, you can also see a general graph of crawling activity to your site.

There is one other type of date that you might see associated with listings that has nothing to do when the page was crawled. Look here:

Google Personalized Search Last Visit Date

See the "3 visits - Feb 14" part? That’s coming from Google Personalized Search and shows that I’ve clicked on that listing 3 times, with the last visit being on Feb. 14. My Google Ramps Up Personalized Search article from earlier this month explains more about how Google Personalized Search works and can be disabled, if you don’t like it on, as now happens much more often.

Microsoft Live Search

Microsoft Live Search operates like Google. Some pages show dates next to them, as I’ve highlighted below:

Crawl Date At Microsoft Windows Live

As with Google, this seems to happen with pages that are being spidered frequently, but I’ll check on this. Does a page lack a date? Then click on the "cached page" link. When the cached page loads, you’ll see something like this at the top of it:

This is a version of http://www.pixar.com/theater/trailers/cars/index.html as it looked when our crawler examined the site on 2/16/2007. The page you see below is the version in our index that was used to rank this page in the results to your recent query.

The date (which I’ve but in bold above) tells you when the page was last spidered.

Don’t see a cached page option? The site owner is probably blocking caching. Are you a site owner that wants to block caching? Visit the help area at Live and search for "cache" to find more info. I’d point you to the right place, but it remains impossible to link to particular pages in Microsoft’s absurd help system.

[Postscript: Microsoft sent this information: "We only show the last-crawl date when it is within a few days. This is a decision to draw attention to the freshest content without highlighting older content. Crawl dates for other documents can be found by looking at the cached page."]

Ask.com

At Ask.com, you can only get dates by looking at the cached pages, similar to how that works at Google and Microsoft. Click on the "Cached" link that you’ll see next to the URL of a listing, as highlighted below:

Crawl Date At Ask.com

At the top of the page, you’ll see something like this with the date and time (shown in bold below) that the page was last visited:

Below is a cache or saved snapshot of  http://www.cars.com/  as we found it on February 19, 2007 1:24:56 AM.

Yahoo

At Yahoo, you can only get dates one way, through using Yahoo Site Explorer. You’ll have to create an account for your web site, then authenticate your account, then you’ll be shown last crawl dates as I’ve highlighted in the first listing below:

Crawl Date At Yahoo Site Explorer

More than any other search engine, Yahoo makes it easy for a site owner to see the freshness of many pages all at once. However, the huge disadvantage from a searcher perspective is that you can’t spot check the freshness of any page you randomly select.

The Date & Freshness Table

I love nothing more than doing tables, so let’s put everything above into a nice one:

Feature Ask Google Microsoft Yahoo
Dates Next
To Listings?
No Some Some No
Dates On
Cached Pages?
Yes Yes Yes No
Dates In
Webmaster Tools?
No
Tools
For Errors & Home Page No
Tools
Yes

Ideally, I’d like to see that top row — "Dates Next To Listings?" — be completely "Yes." Some site owners block caching, which makes it hard to measure freshness. Putting the dates right next to the listings makes it easy for anyone who cares to see at a glance if a search engine is stale or fresh.

In fact, I have to laugh. I’ve been asking for this for years. On the old features chart I used to maintain about dates, I wrote in 2001:

Along with the page description, some search engines show the date when a web page was created or modified. As noted above, these dates may not always be reliable. However, they do provide a useful clue as to how fresh or stale a search engine’s listings are. Thus, search engines that show a date deserve praise for doing so.

That was from 2001! Nearly six years later, it’s still the case that dates aren’t being shown. In fact, it’s a reversal. Back in 2001, the major search engines of AltaVista, HotBot (Inktomi) and Northern Light all showed dates for all listings right within search results. Fast forward to today, and none of the major search engines do.

The reason is simple enough. Over time, the search engines either couldn’t maintain freshness or didn’t want to show they were sometimes stale. So dates either went away or never got added. C’mon gang — time to bring them back right into the search results. If they aren’t there by default, make it an option people can enable.

Verifying Freshness

In the meantime, there’s a favorite tactic for those search watchers who want to track freshness. Google’s Matt Cutts once wrote about this back in 2005, describing exactly a technique I and others have long used. You simply find a page that you know carries a date that’s constantly updated. Look at the cached page and see what the time and date says on it.

But Yahoo doesn’t show a date on cached pages! No, it doesn’t, but you’re not looking for the date that the search engine inserts. You want the date on the page itself. For example, here’s the cached page over at Yahoo for CNN:

Finding Dates On Cached Pages

See the part I highlighted in red, that says:

UPDATED: 3:53 a.m. EST, February 26, 2007

That’s the date that CNN had on its own page when the Yahoo spider last visited. When I looked, the date and time was 3:10 pm EST on February 27 — so the page is only 12 hours old. Not bad in this case, but I wouldn’t expect a major news site to be much out of date.

Return Of The Freshness Guarantee?

Finally, I’ll leave you with this trip down memory lane. Back in June 1999, AltaVista once offered a freshness guarantee that was quickly broken. As I wrote at the time:

"AltaVista search is able to make its Freshness Guarantee: no search site will have fresher results than AltaVista."

AltaVista unveiled its first "Freshness Guarantee" back when it relaunched in June, promising that its entire index would be refreshed at least once per month. That guarantee was almost immediately broken, as even AltaVista President Rod Schrock admitted when we talked recently. "We turned our attention to this new system," Schrock said.

OK, fair enough — they wanted to build something even better. But this new guarantee has already been broken, as described above. If claims like these are going to be made, then they should actually be met. And not to meet them in the midst of a huge media blitz is an incredible blunder.

Freshness is one important component to what makes a good search engine. It’s not the only thing. Having fresh results means nothing if the results aren’t relevant. And some pages don’t need to be spidered that often. But putting dates next to listings is an easy form of search "food" labeling that can give reassurance about a major search engines. Surely it’s time for dates to make a comeback.



Danny Sullivan is editor-in-chief of Search Engine Land. He’s a widely cited authority on search engines and search marketing issues who has covered the space since 1996. Danny also oversees Search Engine Land’s SMX: Search Marketing Expo conference series, maintains a personal blog called Daggle and can be followed on Twitter here.

See more articles by Danny Sullivan >


Share, Bookmark & Discuss This Article
More:


Keep Updated: News Via Email | News Via RSS Feed | News Via Twitter


See more stories like this in the Members Library! Check out the Ask: SEO, Ask: Web Search, Google: SEO, Microsoft: Live Search, Microsoft: Live Search SEO, SEO: Blocking Spiders, SEO: General, SEO: Titles & Descriptions, Search Features: Dates, Stats: Freshness, Yahoo: SEO, Yahoo: Site Explorer sections of the Members Library where this story is filed. Members also get access to exclusive video content, a members-only weekly & monthly newsletter, plus more. Check out all the benefits!

3 COMMENTS ON Squeezing The Search Loaf: Finding Search Engine Freshness & Crawl Dates

Matt Cutts,

I’m less concerned about showing the crawl dates on the search results. I would love if Yahoo would start showing the crawl date on their cached pages, because that would allow apples-to-apples comparisons.



Joel Bresler,

Hi, thanks for the interesting article.

I recently launched a a web site on the American folk song Follow the Drinking Gourd and have been tracking home page and site-wide caching information ever since. Bottom line: Since March 1st, Google and Yahoo have lagged on average about half a week behind on the home page, while MSN is running a week or so behind.

The really interesting differences in caching show up in how frequently the search engines are refreshing the other pages in this site. Aside from the home page, the average followthedrinkinggourd.org page had a whopping cache lag of 39 days in Google, vs. 13 days in Yahoo and just 10 in MSN.

Follow the Drinking Gourd: Site Caching statistics

I hope the information proves useful.

All the best,

Joel B



Fitness superstore,

I always factor in how often a website is updated, for me this far outweighs pagerank




RECENT COMMNENTS

  • Buy Advertising said " I've been experimenting with the merger of advertising and entertainment. I think that it can be bot"
  • nickstamoulis said " Wow, this is very interesting, I was not aware of the the Google Books case at all, I will be sure t"
  • nickstamoulis said " These are all very cool, my personal favorite 4th logo is the Ask.com layout, it is very creative!"

See All »


FREE DAILY SEARCH NEWS RECAP!

Stay on top of all the search news with our daily summary, the SearchCap newsletter. View a sample ›

STAY CURRENT THROUGHOUT THE DAY

RSS Feeds

The Search Engine Land feed keeps you informed as news happens. SEE ALL FEEDS »

Upcoming Search Engine Land Conferences

Advertise With Us »

Search Engine Land produces SMX, the Search Marketing Expo conference series. SMX events deliver the most comprehensive educational and networking experiences - whether you're just starting in search marketing or you're a seasoned expert.


SMX Web Site » | SMX Difference » | SMX News »


Join us at an upcoming SMX event:

Search Marketing Now Learn more about search marketing with our free online webcasts and webinars from our sister site, Search Marketing Now. Upcoming online events include:


See more webcast topics »

TRACK US SOCIALLY
Upcoming Search Engine Land Conferences

Get Your Search Engine Land
Premium Membership!

Become a premium member today and receive:

  • Express commenting privileges & photo.
  • Exclusive videos & newsletters.
  • Discounts to our SMX conferences.
  • Access to "How To" & Other Archives.

Learn More

Upcoming Search Engine Land Conferences
Add to GoogleAdd to My Yahoo!Add to BloglinesAdd to NetvibesAdd to Windows Live