Google News & Indexing Old Stories As New
Michael Gray noticed that Google News is continuously indexing old stories on some sites as new stories. Yes, this is the exact issue that influenced United Airlines stock price drop just a week ago. Google blamed the Tribune for the article not containing a date, which is a fair accusation. But As Michael shows, Google […]
Google blamed the Tribune for the article not containing a date, which is a fair accusation. But As Michael shows, Google News is even indexing articles from 2007, as if they were posted just days ago. Now, Google News is not suppose to contain articles older than 30 days in their index. So what is going on here? Let me take you through a couple examples.
As Michael illustrates, a search in Google News for Subprime Troubles Sack HRJ returns a result from the New York Times blog, dated as Sep 14, 2008. Here is a screen capture:
But if you click on the article and read it, you will clearly see it is dated AUGUST 24, 2007, 8:15 AM. So where is this new date coming from? I suspect from the top of the header, where the New York Times posts the current date. Google News likely indexed this article through a “related link” from a new article posted on the New York Times, looked at the page, saw two dates, one date being a current date and reindexed the article. Note, the current date is posted directly under the NY Times logo.
But what upsets me is that Google News knew about this article in August 2007. How do I know? A search in Google News Archive Search returns it as dated on Aug 24, 2007.
Logically, if Google News Archive search has the same URL as a new story they discovered, they should know the story is old? Maybe I am over simplifying things here?
All my searches for examples on sites I write at, such as inurl:2007 site:searchengineland.com or 2007 site:seroundtable.com do not return any stories older than about 30 days. Note that Search Engine Land currently uses the date in the URLs, so the inurl command works for Search Engine Land. Search Engine Roundtable does not time stamp URLs, so that is why I searched for just 2007, which returned only recent articles.
So how big of an issue is this? I am not going to count, but you can try to go through all these articles to just see a sampling. Like we saw with the United Airlines stock price drop, this issue can be devastating for a company.
New on Search Engine Land