Google Book-Scanning Efforts Spark Debate from the Associated Press is an excellent look how the rivalry between Google’s library scanning project and that of the Open Content Alliance – backed by Yahoo and Microsoft — is getting more heated. Google pretty much comes off as the evil company trying to lock up books for its own commercial goals. I’ll try to restore some balance to that. But then again, perhaps the rhetoric is the only thing that will make Google decide it should figure out a way to better assure people that the scanning will be as open source as possible.
The OCA, and in particular Brewster Kahle of the Internet Archive that’s also behind the project, seems to be ramping up the accusations that Google is running a closed system that goes counter to Google’s "Don’t Be Evil" philosophy. From the article:
"They don’t want the books to appear in anyone else’s search engine but their own, which is a little peculiar for a company that says its mission is to make information universally accessible," Kahle said.
He said similar things last month, and far more strongly. From the transcript of a video Philipp Lenssen made at Google Blogoscoped:
Pretty much Google is trying to set themselves up as the only place to get to these materials; the only library; the only access. The idea of having only one company control the library of human knowledge is a nightmare. I mean this is 1984 – a book about how bad the world would be if this really came about, if a few governments’ control and corporations’ control on information goes too far.
Wow. I’ve got great respect for Brewster, but I think making this out into some 1984 info control scenario is going too far. For its part, Google disagrees:
None of Google’s contracts prevent participating libraries from making separate scanning arrangements with other organizations, said company spokeswoman Megan Lamb.
Aside from cutting separate scanning deals, I believe the agreements Google has with libraries gives them copies of what Google has scanned to do with as they wish. So I think it’s a stretch to say Google’s trying to keep everything for themselves. But Google still comes across the crass commercial one in all this:
The motives behind Google’s own book-scanning initiative aren’t entirely altruistic. The company wants to stock its search engine with unique material to give people more reasons to visit its website, the hub of an advertising network that generated most of its $2 billion profit through the first nine months of this year.
Despite its ongoing support for the Open Content Alliance, Microsoft earlier this month launched a book-scanning project to compete with Google. Like Google, Microsoft won’t allow its digital copies to be indexed by other search engines.
Microsoft gets a slight nod at not perhaps so altruistic, but lets be more blunt. This month’s launch of Microsoft’s Live Search Books (gad, what are with these terrible names!) was for all the same commercial reasons Google has. There’s information in books. Providing access to information has been proving a money maker.
Note the part about Microsoft not allowing its digital copies to be indexed. I think this and a similar reference to Google is talking about preventing spiders from crawling the respective book search sites, to automatically download PDF files. That wouldn’t be useful anyway. You need the associated index that’s making the *images* of these books searchable.
Having tossed out some bones of balance Google’s way, let me jump back in on the side of greater cooperation and openness that the OCA is pitching.
Yes, I dearly wish Google would get together with them and other scanning projects and come up a real, open way to index this material. I’ve written before about concerns we’re going to have wasteful, duplicated efforts and some type of VHS/Betamax battle of digital book formats. Gary Price and others have voice concerns as well. The AP story also touches on this:
But some of the participating libraries may have second thoughts if Google’s system isn’t set up to recognize some of their digital copies, said Gregory Crane, a Tufts University professor who is currently studying the difficulty accessing some digital content.
For instance, Tufts worries Google’s optical reader won’t recognize some books written in classical Greek. If the same problem were to crop up with a digital book in the Open Content Alliance, Crane thinks it will be more easily addressed because the group is allowing outside access to the material.
The battle shaping up over book scanning is unfortunate. The books out of copyright aren’t Google’s books — they aren’t the OCA’s books — they aren’t the library’s books. They’re OUR books. Get it together, everyone, and sort something out. Plus, I’d still like to see Google stop scanning books that are in copyright without express permission to help ease the concerns publishers are having. More on that in my past post, Search Engines, Permissions & Moving Forward In Copyright Battles.