Of Climategate, Googlegate & When Stories Get Too Long
Daily Telegraph writer James Delingpole got worked up yesterday because his colleague Christopher Booker’s story on the “Climategate” scandal mysteriously disappeared from Google. Skullduggery, he pondered? Nothing so dramatic, says Google. The article simply grew too big in length to stay in Google News.
Let’s do the breakdown. Booker’s story of November 28 covered the controversy over how academics at the University Of East Anglia were apparently trying to suppress anti-global warming views from other academics from getting widespread attention.
Booker’s story, Climate change: this is the worst scientific scandal of our generation, even had a Google connection right from the start, leading off:
A week after my colleague James Delingpole , on his Telegraph blog, coined the term “Climategate” to describe the scandal revealed by the leaked emails from the University of East Anglia’s Climatic Research Unit, Google was showing that the word now appears across the internet more than nine million times.
Or maybe 1 million times, if you search using +climategate to eliminate possible synonyms and alternative spellings that may or may not be related. Then again, over at Bing, it’s 54 million matches, dropping to 1 million also if you do +climategate. Search engine number counts are slippery little devils, and I don’t recommend citing them as proof of anything.
But still, 1 million, 10 million, let’s not quibble. This was a big story, whether or not you give it the “Climategate” name that Delingpole coined. So big that when it Booker’s article disappeared from Google News, he suspected the worse, writing:
What is going on at Google? I only ask because last night when I typed “Global Warming” into Google News the top item was Christopher Booker’s superb analysis of the Climategate scandal.
It’s still the most-read article of the Telegraph’s entire online operation – 430 comments and counting – yet mysteriously when you try the same search now it doesn’t even feature. Instead, the top-featured item is a blogger pushing Al Gore’s AGW agenda.
Perhaps there’s nothing sinister in this. Perhaps some Google-savvy reader can enlighten me…..
UPDATE: Richard North has some interesting thoughts on this. He too suspects some sort of skullduggery.
I’m quoting his entire piece, because I’m going to dissect it bit by bit for how absurd it is, before I even get to the official Google explanation. And hat tip to David Dalka for alerting me to this story, by the way.
Wow, the story is no longer the top item? Well, stories at the Daily Telegraph’s site itself change throughout the day. Heck, the Daily Telegraph’s print edition changes each day. And so, too, does Google News change. At least hourly, in fact. Here are two articles we’ve published this past month that explain more about this:
- Google’s News Experiments & The Quest To Solve The “Read State” Issue
- Under The Hood: Google News & Ranking Stories
Shouldn’t Delingpole have known this before ringing the alarm bells? After all, he appears familiar with Google News, in that he understood when Booker’s article appeared there. Surely he’s seen other articles move on and off? And working for the Daily Telegraph, surely he could actually ask someone on staff who’s familiar with its presence in Google for some further background (I’m virtually certain they have someone like this).
But no. Instead, it has to be painted as a plot. Look, Booker’s article is gone and in its place, conveniently, a pro-Gore blogger pushing that global warming is real.
Let’s get this straight. There’s no lack of eyes on Google News. There’s also no lack of story selection it could edit, if it wanted to. Perhaps some stories that are critical of Google, maybe? Maybe some stories to push some particular Google-liberal-whatever agenda in its home country of the United States, maybe?
No, instead what Google does is decide to wipe out one particular article on the global warming issue. That’s where it’s going to shoot its credibility wad. Oh, and make sure to do it to the Daily Telegraph, which has attacked Google for showing its stories. That’ll introduce some nice irony. When the Telegraph complains about the missing story, Google can just say “Oh, thought you didn’t want us destroying your business model.”
Yeah, that’s the ticket. Yeah, kill that one story, and Google will somehow manage to keep the truth from getting out there. Because it’s not like there aren’t those other 1 million to 54 million pages that mention “Climategate” on the web, depending on which count you want to believe.
Oh, but Delingpole says maybe it’s nothing. But then again, he notes someone else is exploring the issue and “too suspects some sort of skullduggery.” IE — there Delingpole gives the impression he believes there IS skullduggery, and the implication is that Google’s to blame.
That other person — Richard North — actually doesn’t blame Google over the mystery but instead wonders if someone has hacked the Daily Telegraph site and manage to get this one page blocked. He writes:
This cannot be accidental – there is a quite deliberate attempt to prevent this piece being listed. Repeating the exercise on Bing.com and Yahoo.co.uk news pages gets similar nil results. Yet other headlines from comment pieces from The Sunday Telegraph show up immediately.
James Dellingpole has picked up the problem (great minds) but my guess is that this isn’t a Google issue. The problem probably lies closer to home – there looks to be an enemy in the camp, who has probably been using this, or something like it.
The same piece not found in three different search engines? Yes, that’s odd. It’s very much a classic sign of an indexing problem. But if you had access to block one story, you’d probably try to block many of them. Plus, let’s be reasonable, blocking Booker’s story wouldn’t keep his particular views from getting out.
That’s particularly the case in that when looked today. I couldn’t find the story at the Daily Telegraph, but I did find copies of the identical story syndicated on other sites. I also found right at the top of regular Google in a search for it by the headline. So it wasn’t blocked from Google. It just wasn’t showing in Google News.
That’s odd, so I checked with Google. Remember Delingpole talking about all those comments the story got? That’s the culprit, Google says. There were so many that the story ballooned over the 1MB size, causing it to be dropped from Google News as too large (too large in file size, not too big of a story topic!). From the statement I was sent:
The article attracted so many comments that it exceeded a threshold for the page being too large (it’s more than 1.3 MB of HTML at this point). We’re looking at whether it makes sense to allow larger pages in the future. As with Google Search, our goal for Google News is to give users the most relevant, objective results, which is why we generate them automatically and without human intervention.
Google web search used to be this way many years ago. It used to index the first 101K of an article and ignore the rest. Of course, pages that were too big still got listed, unlike what happens with Google News, apparently.
Now if you want some mysteries, here are two. First, the article is flagged by the Daily Telegraph with a meta robots tag that tells Google not to show a cached version. And yet, I can see a cached copy. What’s up with that?
Also, a known issue with Google News is that once it visits a news story, it doesn’t come back for updates. So how did it realize the story got too big for inclusion? (Postscript: Brent Payne, who oversees SEO for the Tribune papers, tweets that this is no longer an issue).
I’ve got follow up questions out to Google on both of these issues. Perhaps they’ll have a convenient explanation to cover up any lingering doubts to turn this into a conspiracy.
Seriously, there probably are good explanations. And if I sound harsh on Delingpole, it’s because I get stories like this all the type where ordinary common sense should eliminate the conspiracy theories. Someone’s little known page on a little cared about topic goes missing, and it turns into a “Google’s out to get me” situation. As if Google even knew who they were.
In this case, it’s a well-known story on a politically-charged topic. But it’s not the first well-known story on a politically-charged topic where Google might have felt temptation to assert editorial control. So why would it start here? And why with just one story? And for all the brains at Google, fully aware of how news flows on the web, they’d be that stupid to think no one would notice.
No, these things don’t add up. But having to debunk what the Daily Telegraph could have investigated itself, rather than just blogged and alleged, leaves me kind of grumpy. Delingpole doesn’t mention in his piece about trying to contact Google in any way — there’s no “waiting to hear back from Google” or anything like that.
Maybe a Matt cartoon on the entire Googlegate affair would cheer me up.
Postscript: A comment below as well as this email I received raises an issue with the related searches (called Google Suggest) that appear when you start to type in the search box:
On 11/25/09, I could type “cli” into Google and “climategate” would pop up as the top suggestion, with around 3 million results. The next day, I could type as much as “climategat” and no suggestions whatsoever. Someone at Google had deleted it as a search suggestion. Then Sunday afternoon, it was back again as a suggestion with around 13.3 million results. Today, it has disappeared again as a search suggestion and only 11.4 million results.
I checked with Google and got back this statement:
Google has not ever removed the query [climategate] or variations of the query from Google Suggest.
Google Suggest uses a variety of algorithms in order to come up with relevant suggestions while the user is typing. We do remove certain clearly pornographic or hateful or malicious slur terms from Suggest.
My assumption is that on one day, if a lot of people were searching for climategate, then that might appear. Then if queries dropped off, the suggestion might go away. Then return again if more started searching again. I’m checking to see if I can get more clarification.