• http://gregword.com Greg Gershman

    “It’s embarrassing for Google, but the fault really lies with Wikipedia, since this text stayed on Washington’s page long enough for Google to catch it.”

    I’d say the fault is more with Google; Wikipedia is what it is, and people should know what they are getting when they use it, but it’s Google’s algorithm that gives them such authority, and Google’s aggressive spiders that index content that could be unstable so quickly. So I’d say it’s just emabarassing for Wikipedia, but the fault for giving it prominence in the search results is Google’s alone.

  • anonymous

    Actually the text stayed there for only 2 minutes according to the history logs within Wikipedia here: http://en.wikipedia.org/w/index.php?title=George_Washington&diff=130794155&oldid=130793991

    This just happens to be bad timing of the Google Web crawler searching Wikipedia. Most nonsense like this are removed fairly quickly.

  • anonymous

    Actually the text stayed there for only 2 minutes according to the history logs within Wikipedia here: http://en.wikipedia.org/w/index.php?title=George_Washington&diff=130794155&oldid=130793991

    This just happens to be bad timing of the Google Web crawler searching Wikipedia. Most nonsense like this are removed fairly quickly. The person who did this was banned I think.

  • http://searchengineland.com Danny Sullivan

    It’s actually really confusing, because I went to the Wikipedia page from Google and saw that exact text on it. Then seconds later, it was gone. Then I hit the history and couldn’t find a deletion.

    Maybe I’m crazy. Maybe I somehow got confused. On the other hand, Google saw that text there as of 15 May 2007 07:53:44 GMT. The change you noted happened between 15:19, 14 May 2007 and Revision as of 15:20, 14 May 2007. That’s a day before Google arrived to the page (assuming Wikipedia uses GMT). If the time zones are off, still — wow, Google hit that page in the one minute the text was there? Incredible bad timing, I guess.

  • http://www.wolf-howl.com graywolf

    >wow, Google hit that page in the one minute the text was there?

    you know if I told that to my grandma she’s look down at me over the top of her half-rimmed glasses with a look that say “son I’ve been around the block a few more times than you, now you don’t expect me to believe that now do you”

  • BanyanTree

    I notice from your screenshot that that “cached” link is in ‘visited’ purple shade, while the rest of the links are blue. You sure that you didn’t just go to the Google cached page? As a Wikipedia admin, I assure you that there are no relevant deleted revisions for that page, not that anyone would bother deleting such run-of-the-mill vandalism.

    A bigger problem is likely all the Wikipedia mirrors – when people take a content dump of Wikipedia and put up a mirror site with ads in a sidebar. I’ve seen mirrors with mistakes and vandalism that were corrected years ago.

  • http://www.jehochman.com JEHochman

    Of course the clever thing for Wikipedia would be to use checkpoints in combination with the logging they already do. A checkpoint is a specific version of the data that’s verified as good. If something goes wrong, it’s easy to roll back to the latest checkpoint.

    Wikipedia could allow administrators (or other trusted community members) to tag non-vandalized versions of articles as checkpoints. When a search engine visits, it might be a good idea to serve the latest checkpoint, rather than the latest version of the article.

    Are there any Wikipedia developers watching this thread?

  • http://searchengineland.com Danny Sullivan

    @Greg Gershman: I can’t see faulting Google for indexing “unstable” content quickly. Unstable is content that changes often — IE, fresh content. You want a spider to get the freshest information it can in these cases.

    @graywolf: yeah :)

    @BanyanTree: I did go to the cache. It is possible I confused the two. But I don’t think so. The cached page looks very different from the real page. I can’t explain not seeing it in the change history, however. I immediately looked to see why this wasn’t being reflected there. It was odd to me, but I’ll go with what Wikipedia is reporting officially — that this edit was up for one minute, and it just happened to be that was the minute Google came a calling.

    @JEHochman: That’s an excellent suggestion. I debated writing this story up at all, but it is kind of serious. I mean, Wikipedia isn’t going to be considered kid-safe to Google, right? So some teacher does a search on Google in front of their class, and this is what comes up? Not good. Of course, serving up a copy of a page different than what the user sees would be cloaking. Heh. But I could cut some slack there.

  • http://www.jehochman.com JEHochman

    Wikipedia can offer a personalization setting that allows users to see the latest copy, or the latest copy that passes the filter (and make this the default). There is already a test used for semi-protecting pages from vandalism by new and anonymous users. They could use that same test to identify the last “good” copy of a page.

  • BanyanTree

    Article validation and reviewed/stable articles have been requested for a long time. See http://meta.wikimedia.org/wiki/Reviewed_article_version and http://meta.wikimedia.org/wiki/Article_validation_feature. Fully 40% of the paid staff of the Wikimedia Foundation of 5 are developers, and there are also a number of great volunteer devs. But that’s still only enough for the most painfully slow progress to something as major as stable versioning. Anybody want to give the Foundation a grant to pay the salary for more devs? ;)

  • katy

    Just throwing it out there, but maybe the person who posted had an idea of when the page would be indexed. The window of opportunity for that to happen is pretty slim.