Authorama: Testing If Google Can Restrict Public Domain Books It Offers For Download

Danny Sullivan on
  • Categories: Channel: Consumer, Google: Book Search, Legal: Copyright, Search Engines: Book Search Engines
  • Freeing Google Books from Philipp Lenssen at Google Blogoscoped covers him trying an interesting experiment. Can Google dictate that public domain books that it has scanned and distributed on the web really be subject to restrictions on non-commercial work?

    First a look at how Google provides these books. Google says that PDF downloads are available for most out of copyright books via Google Book Search. Sadly, I find the instructions on finding these don’t match the actual difficulty in doing so. From the instructions:

    Search for downloadable books by clicking on the “Full view books” radio button before entering your search terms. Once you select a book from your results, you’ll see a “Download” button on the right side of the page if the book is out of copyright. Click the button to download a PDF of the book to your computer. Once the book is downloaded, you can print it and read it at your own pace.

    Unfortunately, getting back results in this way doesn’t make it immediately clear which books can be downloaded or not. For example, here’s search on the word cars. I did the search to match “Full view books” as instructed. After that, the only way to know if any of these are downloadable is to click on each individual book and check. That’s a pain.

    The advanced search page doesn’t offer any help, either. Really, there needs to be a third option on the regular search page to narrow to downloadable books, like this:

    1. All books
    2. Full view books
    3. Downloadable books

    I did try to narrow using a filetype search, like this:

    cars filetype:pdf

    That’s not supported. The only other option is to search for books from before 1923, since Google reports that as the general cutoff date is uses to consider a work public domain. Here’s an example of that.

    I’ve had to kind of fake it. Google Book Search demands a date range, rather than giving you a “Books Before X Date” option. So I went after books from between 1000 and 1923 AD.

    Here’s an example of one of those books, close up. Look over on the right hand side of the screen, and you see an invitation to download it.

    After you download it and open the PDF, you’re greeted by this warning and document guidelines on the opening screen:

    This is a digital copy of a book that was preserved for generations on library shelves before it was carefully scanned by Google as part of a project to make the world’s books discoverable online.

    It has survived long enough for the copyright to expire and the book to enter the public domain. A public domain book is one that was never subject to copyright or whose legal copyright term has expired. Whether a book is in the public domain may vary country to country. Public domain books are our gateways to the past, representing a wealth of history, culture and knowledge that’s often difficult to discover.

    Marks, notations and other marginalia present in the original volume will appear in this file – a reminder of this book’s long journey from the publisher to a library and finally to you.

    Usage guidelines

    Google is proud to partner with libraries to digitize public domain materials and make them widely accessible. Public domain books belong to the public and we are merely their custodians. Nevertheless, this work is expensive, so in order to keep providing this resource, we have taken steps to prevent abuse by commercial parties, including placing technical restrictions on automated querying.

    We also ask that you:

    + Make non-commercial use of the files We designed Google Book Search for use by individuals, and we request that you use these files for personal, non-commercial purposes.

    + Refrain from automated querying Do not send automated queries of any sort to Google’s system: If you are conducting research on machine translation, optical character recognition or other areas where access to a large amount of text is helpful, please contact us. We encourage the use of public domain materials for these purposes and may be able to help.

    + Maintain attribution The Google “watermark” you see on each file is essential for informing people about this project and helping them find additional materials through Google Book Search. Please do not remove it.

    + Keep it legal Whatever your use, remember that you are responsible for ensuring that what you are doing is legal. Do not assume that just because we believe a book is in the public domain for users in the United States, that the work is also in the public domain for users in other countries. Whether a book is still in copyright varies from country to country, and we can’t offer guidance on whether any specific use of any specific book is allowed. Please do not assume that a book’s appearance in Google Book Search means it can be used in any manner anywhere in the world. Copyright infringement liability can be quite severe.

    There’s also a watermark on pages of the book, as you can see in the example below:

    The guidelines have Philipp scratching his head. If these are public domain books, then how can Google decide to restrict them in any way, such as for commercial publication. Yes, it scanned the books. Maybe it owns the scans? But maybe not.

    To find out, he’s doing a test project. Authorama is a site he’s created that lists 100 books he’s downloaded from Google Book Search, to allow others to redistribute or use as they like.

    I’m checking with Google to see what they think about the project and the legality of trying to impose restrictions on public domain books, just because they’ve scanned them.

    Postscript: I’ve now heard back from Google, which says:

    We have gotten this question in the past.  The front matter of our PDF books is not a EULA [end user license agreement].  We make some requests, but we are not trying to legally bind users to those requests.  We’ve spent (and will continue to spend) a lot of time and money on Book Search, and we hope users will respect that effort and not use these files in ways that make it harder for us to justify that expense (for example, by setting up the ACME Public Domain PDF Download service that charges users a buck a book and includes malware in the download).  Rather than using the front matter to convey legal restrictions, we are attempting to use it to convey what we hope to be the proper netiquette for the use of these files.


    About The Author

    Danny Sullivan
    Danny Sullivan was a journalist and analyst who covered the digital and search marketing space from 1996 through 2017. He was also a cofounder of Third Door Media, which publishes Search Engine Land, Marketing Land, MarTech Today and produces the SMX: Search Marketing Expo and MarTech events. He retired from journalism and Third Door Media in June 2017. You can learn more about him on his personal site & blog He can also be found on Facebook and Twitter.