Subscribe Via Web Feed Subscribe with Google Add to My Yahoo! Subscribe with Bloglines Add to netvibes Subscribe with Live.com

« Feds Spur Demand For Enterprise Info-Discovery Tools | Main | Google Searchology: CLIR and Views »

May. 16, 2007 at 4:41pm Eastern by Danny Sullivan

George Washington Did What According To Wikipedia???

Many have joked about how Wikipedia seems to rank at the top of practically any Google search that you do. Often, that's a good thing, as Wikipedia has lots of great information. But a search on george washington today shows a downside. Someone edited the start of the Wikipedia entry about the first US president to be less than flattering. Google spidered the entry, and that material was used to form the description of Washington's Wikipedia page, as shown below. Look at the second listing:

George Washington On Google

It's embarrassing for Google, but the fault really lies with Wikipedia, since this text stayed on Washington's page long enough for Google to catch it. Indeed, it looks to have been on the page for at least a day. It's gone now, but when I looked about a half hour ago, the text was still there. It will probably take about another day for the description to fall out of Google itself, once the page is recrawled.

FYI, the description does not show at Yahoo or Ask.com because Wikipedia is not in the top result for a search on George Washington there. At Live.com, the Wikipedia page shows but was last visited on May 13, before the insulting text was added.

Like The Story? Vote For It On Yahoo Buzz!
Subscribe To Our Daily Search News Recap!
Your Email:
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds)
Subscribe To Our Search Feed!
Subscribe Via Web FeedSubscribe with GoogleAdd to My Yahoo!Subscribe with BloglinesAdd to netvibes
Subscribe with Live.comSubscribe in NewsGator OnlineSubscribe in RojoAdd to My AOL
Share & Bookmark This Story!
By Danny Sullivan Permalink Jump To Comments See Related Stories In: SEO: Titles & Descriptions, Search Engines: Wikipedia



Reader Comments

"It's embarrassing for Google, but the fault really lies with Wikipedia, since this text stayed on Washington's page long enough for Google to catch it."

I'd say the fault is more with Google; Wikipedia is what it is, and people should know what they are getting when they use it, but it's Google's algorithm that gives them such authority, and Google's aggressive spiders that index content that could be unstable so quickly. So I'd say it's just emabarassing for Wikipedia, but the fault for giving it prominence in the search results is Google's alone.

Actually the text stayed there for only 2 minutes according to the history logs within Wikipedia here: http://en.wikipedia.org/w/index.php?title=George_Washington&diff=130794155&oldid=130793991

This just happens to be bad timing of the Google Web crawler searching Wikipedia. Most nonsense like this are removed fairly quickly.

Comment by anonymous [TypeKey Profile Page] | May 16, 2007 8:44 PM

Actually the text stayed there for only 2 minutes according to the history logs within Wikipedia here: http://en.wikipedia.org/w/index.php?title=George_Washington&diff=130794155&oldid=130793991

This just happens to be bad timing of the Google Web crawler searching Wikipedia. Most nonsense like this are removed fairly quickly. The person who did this was banned I think.

Comment by anonymous [TypeKey Profile Page] | May 16, 2007 8:44 PM

It's actually really confusing, because I went to the Wikipedia page from Google and saw that exact text on it. Then seconds later, it was gone. Then I hit the history and couldn't find a deletion.

Maybe I'm crazy. Maybe I somehow got confused. On the other hand, Google saw that text there as of 15 May 2007 07:53:44 GMT. The change you noted happened between 15:19, 14 May 2007 and Revision as of 15:20, 14 May 2007. That's a day before Google arrived to the page (assuming Wikipedia uses GMT). If the time zones are off, still -- wow, Google hit that page in the one minute the text was there? Incredible bad timing, I guess.

>wow, Google hit that page in the one minute the text was there?

you know if I told that to my grandma she's look down at me over the top of her half-rimmed glasses with a look that say "son I've been around the block a few more times than you, now you don't expect me to believe that now do you"

I notice from your screenshot that that "cached" link is in 'visited' purple shade, while the rest of the links are blue. You sure that you didn't just go to the Google cached page? As a Wikipedia admin, I assure you that there are no relevant deleted revisions for that page, not that anyone would bother deleting such run-of-the-mill vandalism.

A bigger problem is likely all the Wikipedia mirrors - when people take a content dump of Wikipedia and put up a mirror site with ads in a sidebar. I've seen mirrors with mistakes and vandalism that were corrected years ago.

Comment by BanyanTree [TypeKey Profile Page] | May 16, 2007 10:41 PM

Of course the clever thing for Wikipedia would be to use checkpoints in combination with the logging they already do. A checkpoint is a specific version of the data that's verified as good. If something goes wrong, it's easy to roll back to the latest checkpoint.

Wikipedia could allow administrators (or other trusted community members) to tag non-vandalized versions of articles as checkpoints. When a search engine visits, it might be a good idea to serve the latest checkpoint, rather than the latest version of the article.

Are there any Wikipedia developers watching this thread?

@Greg Gershman: I can't see faulting Google for indexing "unstable" content quickly. Unstable is content that changes often -- IE, fresh content. You want a spider to get the freshest information it can in these cases.

@graywolf: yeah :)

@BanyanTree: I did go to the cache. It is possible I confused the two. But I don't think so. The cached page looks very different from the real page. I can't explain not seeing it in the change history, however. I immediately looked to see why this wasn't being reflected there. It was odd to me, but I'll go with what Wikipedia is reporting officially -- that this edit was up for one minute, and it just happened to be that was the minute Google came a calling.

@JEHochman: That's an excellent suggestion. I debated writing this story up at all, but it is kind of serious. I mean, Wikipedia isn't going to be considered kid-safe to Google, right? So some teacher does a search on Google in front of their class, and this is what comes up? Not good. Of course, serving up a copy of a page different than what the user sees would be cloaking. Heh. But I could cut some slack there.

Wikipedia can offer a personalization setting that allows users to see the latest copy, or the latest copy that passes the filter (and make this the default). There is already a test used for semi-protecting pages from vandalism by new and anonymous users. They could use that same test to identify the last "good" copy of a page.

Article validation and reviewed/stable articles have been requested for a long time. See http://meta.wikimedia.org/wiki/Reviewed_article_version and http://meta.wikimedia.org/wiki/Article_validation_feature. Fully 40% of the paid staff of the Wikimedia Foundation of 5 are developers, and there are also a number of great volunteer devs. But that's still only enough for the most painfully slow progress to something as major as stable versioning. Anybody want to give the Foundation a grant to pay the salary for more devs? ;)

Comment by BanyanTree [TypeKey Profile Page] | May 17, 2007 10:18 AM

Just throwing it out there, but maybe the person who posted had an idea of when the page would be indexed. The window of opportunity for that to happen is pretty slim.

Comment by katy [TypeKey Profile Page] | May 17, 2007 1:51 PM

Search:

Search Marketing Expo

Save the date for:
SMX Local & Mobile - San Francisco, CA (July 24-25) See the agenda, and register now!
SMX Sao Paolo - Brazil - (Aug. 7-8)
SMX China - September 23 & 24
SMX Stockholm - September 23 & 24
SMX East - NYC - (Oct. 6-8) Registration is now open.
SMX London - November 4 & 5

Search Marketing Now

Learn more about search marketing through free online webcasts and webinars from our sister site Search Marketing Now.

Upcoming Webcasts:

Most Recent News Posts

About Search Engine Land

Stay Updated!

Get Our Search Newsletters:
Email:
Daily Monthly

Get Our Search Feed:
Subscribe Via Web FeedSubscribe with Google
Add to My Yahoo!Subscribe with Bloglines
Add to netvibesSubscribe with Live.com
Subscribe in NewsGator OnlineSubscribe in Rojo
Add to My AOL
More About Our Feeds & Newsletters

Add to Technorati Favorites

Track Us Socially:
Facebook: Our Search News App
Facebook: Search Engine Land Page
Facebook: Search Engine Land Group
Flickr: Search Engine Land
LinkedIn: Search Engine Land Group
Twitter: Search Engine Land Feed

Bragroll