Google Book Search Settlement Unfair to Non-US Authors

Claude AlmansiBy Claude Almansi
Editor, Accessibility Issues

Of Books and Vegetables

I first thought of calling this post “Of Books and Vegetables” because, when I half woke up the morning after I sent a letter of objection to the Google Book Search Settlement, I remembered Ms B. and the building site for a middle school in Cortona. The building activity had stopped just after the ground had been cleared, due to blocked funds. So for two years,  Ms B., who lived on the other side of the street, used it  to grow very tasty tomatoes and zucchini No one objected to this private exploitation of  the site: it would have been silly to waste its potential, and Ms B. generously shared her vegetables with friends and neighbours. When the funding issue was solved, the building started again and her vegetable patch was bulldozed.

I chose a more conservative title because the analogy with Google scanning out-of-print works in libraries is imperfect: if a big canning industry, instead of Ms B., had started to grow vegetables on the building site,  the borough of Cortona would probably have tried to levy a rental for this use. But the principle remains: it is silly, even immoral, to waste potential revenue – especially if its exploitation will serve the public.

Challenging or objecting?

So I did not object to the Google Book Search Settlement for the same proprietary reasons as the eminent cultural personalities who signed the Heidelberg Appeal (English textGerman text with signatures):

Comic where someone says: Well, I'll be cross-eyed, Billy Goat! Cattle rustlers! This explains th' strange noises in th' ghost town above --- No wonder it was called Whispering Walls

Actually, I did not mean to object: at first I only challenged the Settlement Registry’s classifying as  “not commercially available” the Google scan of  Theatre of Sleep, an anthology my late husband Guido Almansi and I had edited and published with Pan Books in 1986 – and for which, after his death in 2001, I was the remaining mentioned copyright holder.

The physical book has indeed been out of print for years, but it contains many excerpts from in-copyright and commercially available works, which we had obtained permission to use in – and only in – the Pan Book version. Even if the Settlement foresees the possibility for right holders on such excerpts to claim them and forbid Google to display them, some right holders might not know about the Settlement, or not remember exclusive permissions granted decades ago; besides, the search engine of the Settlement registry often does not find the authors of such excerpts. Under our initial transactions for Theatre of Sleep, I am answerable to these right holders – no pact between parties who had nothing to do with these transactions can change this.

Another reason not to allow Google to display even the rest of the anthology under the Settlement’s conditions was the absolutely unacceptable digital restriction of what – paying – users would be able to print or copypaste from Google books. Such digital restriction measures just don’t work: in Copying from a Google Book, I show how easy it is to do so even with theoretically thus restricted works. And if users pay for an e-book, they should be able to do what they want with it for personal use. So I made an unprotected e-version of what was legally offerable in Theatre of Sleep, and uploaded it  in archive.org/details/TheatreOfSleep, an in-progress version because I will re-add in-copyright texts when I get permission again.

Foreign authors and the Settlement

I could have left things at that, without objecting to the Settlement. But Peter Brantley of the Internet Archive pointed out in an e-mail that many people who are hit by the Settlement and utterly dislike it do not object because it is too complex and they have no legal training. This is my situation too, so I included the excessive complexity of the Settlement in my objections.

Theatre of Sleep An anthology of literary dreams - Guido Almansi Claude BéguinThen there was another reason for objecting. Guido and I also did an adaptation of Theatre of Sleep for the Italian readership – Teatro del Sonno – which was published by Garzanti in 1988, is out of print, and has been scanned by Google. For that one we had ceded the copyright to Garzanti, mainly because we did not want to send the permission requests all over again and Garzanti could do that more easily.

But Garzanti has not yet claimed Teatro del Sonno under the Settlement. Its editorial director explained to me that Italian publishers have chosen to wait for the result of the Final Fairness Hearing about it, in case it results in its invalidation: due to the imprecision of the Registry’s search engine, checking what Google has and has not scanned is very time-consuming. Though they are very displeased with the Settlement, Italian publishers are not objecting either, apparently. Above all, they are not systematically informing their authors about the Settlement.

Considering what little info non-US media gave about the Settlement, we are left with the impression that it was a US-only affair. However, this lack of information puts non-US authors at risk. As Mary Minow explained in Google Book Settlement, orphan works, and foreign works (LibraryLaw Blog, April 21, 2009):

The largest group of non-active rights holders are likely to be foreign authors. In spite of Google’s efforts to publicize the settlement abroad, I suspect that most foreign rights owners of out-of-print books will fail to register with the Registry.  There are a couple of reasons for this.  For one, they may not know that their book is still protected by copyright in the US.  In addition, they may assume that international network of reproduction rights organizations would manage their royalties, and not understand the need to register separately. . . .

If there is an injustice being done in the settlement, it is with foreign authors.

Also, if foreign right-holders do not object to the Settlement, how is the US Court to know that they disapprove of it?

Letter of objections

Hence my letter of objections, below. Not because I think they are representative of non-US objections, but because I believe it is important that non-US right-holders object to the Settlement if they disapprove of it, even if their reasons are very different. The deadline for doing so is Sept. 4, 2009, and for the modalities, see 24. How can I object to the Settlement? in the Settlement’s FAQs.

Direct download links: PDFODT

Links

I have gathered / am gathering some bookmarks about the Settlement in diigo.com/user/calmansi/googlesettlement. Several of those, in particular about its repercussions outside US, come from the very useful Google Settlement Information, Documents, News &  Links page in Michael W. Perry’s Inkling Books.

Credits

By order of appearance:

Unhide That Hidden Text, Please

claude80By Claude Almansi
Staff Writer

Thanks to:

  • Marie-Jeanne Escure, of Le Temps, for having kindly answered questions about copyright and accessibility issues in the archives of the Journal de Genève.
  • Gabriele Ghirlanda, of Unitas, for having tested the archives of the Journal de Genève with a screen reader.

What Hidden Text?

Here, “hidden text” refers to a text file combined by an application with another object (image, video etc.) in order to add functionality to that object: several web applications offer this text to the reader together with the object it enhances – DotSUB offers the transcript of video captions, for instance:

dotsub_trscr

Screenshot from “Phishing Scams in Plain English” by Lee LeFever [1].

But in other applications, unfortunately, you get only the enhanced object, but the text enhancing it remains hidden even though it would grant access to content for people with disabilities that prevent them from using the object and would simplify enormously research and quotations for everybody.

Following are three examples of object-enhancing applications using text but keeping it hidden:

Multilingual Captioning of YouTube and Google Videos

Google offers the possibility to caption a video by uploading one or several text files with their timed transcriptions. See the YouTube example below.

yt_subtYouTube video captioning.

Google even automatically translates the produced captions into other languages, at the user’s discretion. See the example below. (See “How to Automatically Translate Foreign-Language YouTube Videos” by Terrence O’Brien, Switch,

yt_subt_trslOption to automatically translate the captions of a YouTube video.

Nov. 3, 2008 [2], from which the above two screenshots were taken.) But the text files of the original captions and their automatic translations remain hidden.

Google’s Search Engine for the US Presidential Campaign Videos

During the 2008 US presidential campaign, Google beta-tested a search engine for videos on the candidates’ speeches. This search engine works on a text file produced by speech-to-text technology. See the example below.

google_election_searchGoogle search engine for the US presidential election videos.

(See “Google Elections Video Search,” Google for Educators 2008 – where you can try the search engine in the above screenshot – [3] and “‘In Their Own Words’: Political Videos Meet Google Speech-to-text Technology” by Arnaud Sahuguet and Ari Bezman. Official Google blog, July 14, 2008 [4].) But here, too, the text files on which the search engine works remain hidden.

Enhanced Text Images in Online Archives

Maybe the oddest use of hidden text is when people go to the trouble of scanning printed texts, produce both images of text and real text files from the scan, then use the text file to make the image version searchable – but hide it. It happens with Google books [5] and with The European Library [6]: you can browse and search the online texts that appear as images thanks to the hidden text version, but you can’t print them or digitally copy-paste a given passage – except if the original is in the public domain: in this case, both make a real textual version available.

Therefore, using a plain text file to enhance an image of the same content, but hiding the plain text, is apparently just a way to protect copyrighted material. And this can lead to really bizarre solutions.

Olive Software ActivePaper and the Archives of Journal de Genève

On December 12, 2008, the Swiss daily Le Temps announced that for the first time in Switzerland, they were offering online “free access” to the full archives – www.letempsarchives.ch (English version at [7]) – of Le Journal de Genève (JdG), which, together with two other dailies, got merged into Le Temps in 1998. In English, see Ellen Wallace’s “Journal de Geneve Is First Free Online Newspaper (but It’s Dead),” GenevaLunch, Dec. 12, 2008 [8].

A Vademecum to the archives, available at [9] (7.7 Mb PDF), explains that “articles in the public domain can be saved as

jdg_vm_drm

images. Other articles will only be partially copied on the hard disk,” and Nicolas Dufour’s description of the archiving process in the same Vademecum gives a first clue about the reason for this oddity: “For the optical character recognition that enables searching by keywords within the text, the American company Olive Software adapted its software which had already been used by the Financial Times, the Scotsman and the Indian Times.” (These and other translations in this article are mine.)

The description of this software – ActivePaper Archive – states that it will enable publishers to “Preserve, Web-enable, and Monetize [their] Archive Content Assets” [10]. So even if Le Temps does not actually intend to “monetize” their predecessor’s assets, the operation is still influenced by the monetizing purpose of the software they chose. Hence the hiding of the text versions on which the search engine works and the digital restriction on saving articles still under copyright.

Accessibility Issues

This ActivePaper Archive solution clearly poses great problems for blind people who have to use a screen reader to access content: screen readers read text, not images.

Le Temps is aware of this: in an e-mail answer (Jan. 8, 2009) to questions about copyright and accessibility problems in the archives of JdG, Ms Marie-Jeanne Escure, in charge of reproduction authorizations at Le Temps, wrote, “Nous avons un partenariat avec la Fédération suisse des aveugles pour la consultation des archives du Temps par les aveugles. Nous sommes très sensibilisés par cette cause et la mise à disposition des archives du Journal de Genève aux aveugles fait partie de nos projets.” Translation: “We have a partnership with the Swiss federation of blind people (see [11]) for the consultation of the archives of Le Temps by blind people. We are strongly committed/sensitive to this cause, and the offer of the archives of Journal de Genève to blind people is part of our projects.”

What Digital Copyright Protection, Anyway?

Gabriele Ghirlanda, member of Unitas [12], the Swiss Italian section of the Federation of Blind people, tried the Archives of JdG. He says (e-mail, Jan. 15, 2009):

With a screenshot, the image definition was too low for ABBYY FineReader 8.0 Professional Edition [optical character recognition software] to extract a meaningful text.

But by chance, I noticed that the article presented is made of several blocs of images, for the title and for each column.

Right-clic, copy image, paste in OpenOffice; export as PDF; then I put the PDf through Abbyy Fine Reader. […]

For a sighted person, it is no problem to create a document of good quality for each article, keeping it in image format, without having to go through OpenOffice and/or pdf. [my emphasis]

<DIV style=”position:relative;display:block;top:0; left:0; height:521; width:1052″ xmlns:OliveXLib=”http://www.olive-soft.com/Schemes/XSLLibs&#8221; xmlns:OlvScript=”http://www.olivesoftware.com/XSLTScript&#8221; xmlns:msxsl=”urn:schemas-microsoft-com:xslt”><div id=”primImg” style=”position:absolute;top:30;left:10;” z-index=”2″><img id=”articlePicture” src=”/Repository/getimage.dll?path=JDG/1990/03/15/13/Img/Ar0130200.png” border=”0″></img></div><div id=”primImg” style=”position:absolute;top:86;left:5;” z-index=”2″><img id=”articlePicture” src=”/Repository/getimage.dll?path=JDG/1990/03/15/13/Img/Ar0130201.png” border=”0″></img></div><div id=”primImg” style=”position:absolute;top:83;left:365;” z-index=”2″><img id=”articlePicture” src=”/Repository/getimage.dll?path=JDG/1990/03/15/13/Img/Ar0130202.png” border=”0″></img></div><div id=”primImg” style=”position:absolute;top:521;left:369;” z-index=”2″><img id=”articlePicture” src=”/Repository/getimage.dll?path=JDG/1990/03/15/13/Img/Ar0130203.png” border=”0″></img></div><div id=”primImg” style=”position:absolute;top:81;left:719;” z-index=”2″><img id=”articlePicture” src=”/Repository/getimage.dll?path=JDG/1990/03/15/13/Img/Ar0130204.png” border=”0″></img></div>

From the source code of the article used by Gabriele Ghirlanda: in red, the image files he mentions.

Unhide That Hidden Text, Please

Le Temps‘ commitment to the cause of accessibility for all and, in particular, to find a way to make the JdG archives accessible to blind people (see “Accessibility Issues” above) is laudable. But in this case, why first go through the complex process of splitting the text into several images, and theoretically prevent the download of some of these images for copyrighted texts, when this “digital copyright protection” can easily be by-passed with right-click and copy-paste?

As there already is a hidden text version of the JdG articles for powering the search engine, why not just unhide it? www.letempsarchives.ch already states that these archives are “© 2008 Le Temps SA.” This should be sufficient copyright protection.

Let’s hope that Olive ActivePaper Archive software offers this option to unhide hidden text. Not just for the archives of the JdG, but for all archives working with this software. And let’s hope, in general, that all web applications using text to enhance a non-text object will publish it. All published works are automatically protected by copyright laws anyway.

Adding an alternative accessible version just for blind people is discriminatory. According to accessibility guidelines – and common sense – alternative access for people with disabilities should only be used when there is no other way to make web content accessible. Besides, access to the text version would also simplify life for scholars – and for people using portable devices with a small screen: text can be resized far better than a puzzle of images with fixed width and height (see the source code excerpt above).

Links
The pages linked to in this article and a few more resources are bookmarked under http://www.diigo.com/user/calmansi/hiddentext