« Feds Spur Demand For Enterprise Info-Discovery Tools | Main | Google Searchology: CLIR and Views »
May. 16, 2007 at 4:41pm Eastern by Danny Sullivan
George Washington Did What According To Wikipedia???
Many have joked about how Wikipedia seems to rank at the top of practically any Google search that you do. Often, that's a good thing, as Wikipedia has lots of great information. But a search on george washington today shows a downside. Someone edited the start of the Wikipedia entry about the first US president to be less than flattering. Google spidered the entry, and that material was used to form the description of Washington's Wikipedia page, as shown below. Look at the second listing:
It's embarrassing for Google, but the fault really lies with Wikipedia, since this text stayed on Washington's page long enough for Google to catch it. Indeed, it looks to have been on the page for at least a day. It's gone now, but when I looked about a half hour ago, the text was still there. It will probably take about another day for the description to fall out of Google itself, once the page is recrawled.
FYI, the description does not show at Yahoo or Ask.com because Wikipedia is not in the top result for a search on George Washington there. At Live.com, the Wikipedia page shows but was last visited on May 13, before the insulting text was added.
|
Like The Story? Vote For It On Yahoo Buzz!
Send me the monthly search newsletter too! (Learn more about our newsletters and feeds) |
|
Subscribe To Our Search Feed! |
| Share & Bookmark This Story! |
By Danny Sullivan
Permalink
Jump To Comments
See Related Stories In: SEO: Titles & Descriptions, Search Engines: Wikipedia
Reader Comments
Actually the text stayed there for only 2 minutes according to the history logs within Wikipedia here: http://en.wikipedia.org/w/index.php?title=George_Washington&diff=130794155&oldid=130793991
This just happens to be bad timing of the Google Web crawler searching Wikipedia. Most nonsense like this are removed fairly quickly.
Actually the text stayed there for only 2 minutes according to the history logs within Wikipedia here: http://en.wikipedia.org/w/index.php?title=George_Washington&diff=130794155&oldid=130793991
This just happens to be bad timing of the Google Web crawler searching Wikipedia. Most nonsense like this are removed fairly quickly. The person who did this was banned I think.
It's actually really confusing, because I went to the Wikipedia page from Google and saw that exact text on it. Then seconds later, it was gone. Then I hit the history and couldn't find a deletion.
Maybe I'm crazy. Maybe I somehow got confused. On the other hand, Google saw that text there as of 15 May 2007 07:53:44 GMT. The change you noted happened between 15:19, 14 May 2007 and Revision as of 15:20, 14 May 2007. That's a day before Google arrived to the page (assuming Wikipedia uses GMT). If the time zones are off, still -- wow, Google hit that page in the one minute the text was there? Incredible bad timing, I guess.
>wow, Google hit that page in the one minute the text was there?
you know if I told that to my grandma she's look down at me over the top of her half-rimmed glasses with a look that say "son I've been around the block a few more times than you, now you don't expect me to believe that now do you"
I notice from your screenshot that that "cached" link is in 'visited' purple shade, while the rest of the links are blue. You sure that you didn't just go to the Google cached page? As a Wikipedia admin, I assure you that there are no relevant deleted revisions for that page, not that anyone would bother deleting such run-of-the-mill vandalism.
A bigger problem is likely all the Wikipedia mirrors - when people take a content dump of Wikipedia and put up a mirror site with ads in a sidebar. I've seen mirrors with mistakes and vandalism that were corrected years ago.
Of course the clever thing for Wikipedia would be to use checkpoints in combination with the logging they already do. A checkpoint is a specific version of the data that's verified as good. If something goes wrong, it's easy to roll back to the latest checkpoint.
Wikipedia could allow administrators (or other trusted community members) to tag non-vandalized versions of articles as checkpoints. When a search engine visits, it might be a good idea to serve the latest checkpoint, rather than the latest version of the article.
Are there any Wikipedia developers watching this thread?
@Greg Gershman: I can't see faulting Google for indexing "unstable" content quickly. Unstable is content that changes often -- IE, fresh content. You want a spider to get the freshest information it can in these cases.
@graywolf: yeah :)
@BanyanTree: I did go to the cache. It is possible I confused the two. But I don't think so. The cached page looks very different from the real page. I can't explain not seeing it in the change history, however. I immediately looked to see why this wasn't being reflected there. It was odd to me, but I'll go with what Wikipedia is reporting officially -- that this edit was up for one minute, and it just happened to be that was the minute Google came a calling.
@JEHochman: That's an excellent suggestion. I debated writing this story up at all, but it is kind of serious. I mean, Wikipedia isn't going to be considered kid-safe to Google, right? So some teacher does a search on Google in front of their class, and this is what comes up? Not good. Of course, serving up a copy of a page different than what the user sees would be cloaking. Heh. But I could cut some slack there.
Wikipedia can offer a personalization setting that allows users to see the latest copy, or the latest copy that passes the filter (and make this the default). There is already a test used for semi-protecting pages from vandalism by new and anonymous users. They could use that same test to identify the last "good" copy of a page.
Article validation and reviewed/stable articles have been requested for a long time. See http://meta.wikimedia.org/wiki/Reviewed_article_version and http://meta.wikimedia.org/wiki/Article_validation_feature. Fully 40% of the paid staff of the Wikimedia Foundation of 5 are developers, and there are also a number of great volunteer devs. But that's still only enough for the most painfully slow progress to something as major as stable versioning. Anybody want to give the Foundation a grant to pay the salary for more devs? ;)
Just throwing it out there, but maybe the person who posted had an idea of when the page would be indexed. The window of opportunity for that to happen is pretty slim.


![[TypeKey Profile Page]](http://searchengineland.com/nav-commenters.gif)


"It's embarrassing for Google, but the fault really lies with Wikipedia, since this text stayed on Washington's page long enough for Google to catch it."
I'd say the fault is more with Google; Wikipedia is what it is, and people should know what they are getting when they use it, but it's Google's algorithm that gives them such authority, and Google's aggressive spiders that index content that could be unstable so quickly. So I'd say it's just emabarassing for Wikipedia, but the fault for giving it prominence in the search results is Google's alone.