Unhide That Hidden Text, Please

Posted on January 22, 2009 by JimS

Thanks to:

Marie-Jeanne Escure, of Le Temps, for having kindly answered questions about copyright and accessibility issues in the archives of the Journal de Genève.
Gabriele Ghirlanda, of Unitas, for having tested the archives of the Journal de Genève with a screen reader.

What Hidden Text?

Here, “hidden text” refers to a text file combined by an application with another object (image, video etc.) in order to add functionality to that object: several web applications offer this text to the reader together with the object it enhances – DotSUB offers the transcript of video captions, for instance:

Screenshot from “Phishing Scams in Plain English” by Lee LeFever [1].

But in other applications, unfortunately, you get only the enhanced object, but the text enhancing it remains hidden even though it would grant access to content for people with disabilities that prevent them from using the object and would simplify enormously research and quotations for everybody.

Following are three examples of object-enhancing applications using text but keeping it hidden:

Multilingual Captioning of YouTube and Google Videos

Google offers the possibility to caption a video by uploading one or several text files with their timed transcriptions. See the YouTube example below.

YouTube video captioning.

Google even automatically translates the produced captions into other languages, at the user’s discretion. See the example below. (See “How to Automatically Translate Foreign-Language YouTube Videos” by Terrence O’Brien, Switch,

Option to automatically translate the captions of a YouTube video.

Nov. 3, 2008 [2], from which the above two screenshots were taken.) But the text files of the original captions and their automatic translations remain hidden.

Google’s Search Engine for the US Presidential Campaign Videos

During the 2008 US presidential campaign, Google beta-tested a search engine for videos on the candidates’ speeches. This search engine works on a text file produced by speech-to-text technology. See the example below.

Google search engine for the US presidential election videos.

(See “Google Elections Video Search,” Google for Educators 2008 – where you can try the search engine in the above screenshot – [3] and “‘In Their Own Words’: Political Videos Meet Google Speech-to-text Technology” by Arnaud Sahuguet and Ari Bezman. Official Google blog, July 14, 2008 [4].) But here, too, the text files on which the search engine works remain hidden.

Enhanced Text Images in Online Archives

Maybe the oddest use of hidden text is when people go to the trouble of scanning printed texts, produce both images of text and real text files from the scan, then use the text file to make the image version searchable – but hide it. It happens with Google books [5] and with The European Library [6]: you can browse and search the online texts that appear as images thanks to the hidden text version, but you can’t print them or digitally copy-paste a given passage – except if the original is in the public domain: in this case, both make a real textual version available.

Therefore, using a plain text file to enhance an image of the same content, but hiding the plain text, is apparently just a way to protect copyrighted material. And this can lead to really bizarre solutions.

Olive Software ActivePaper and the Archives of Journal de Genève

On December 12, 2008, the Swiss daily Le Temps announced that for the first time in Switzerland, they were offering online “free access” to the full archives – www.letempsarchives.ch (English version at [7]) – of Le Journal de Genève (JdG), which, together with two other dailies, got merged into Le Temps in 1998. In English, see Ellen Wallace’s “Journal de Geneve Is First Free Online Newspaper (but It’s Dead),” GenevaLunch, Dec. 12, 2008 [8].

A Vademecum to the archives, available at [9] (7.7 Mb PDF), explains that “articles in the public domain can be saved as

images. Other articles will only be partially copied on the hard disk,” and Nicolas Dufour’s description of the archiving process in the same Vademecum gives a first clue about the reason for this oddity: “For the optical character recognition that enables searching by keywords within the text, the American company Olive Software adapted its software which had already been used by the Financial Times, the Scotsman and the Indian Times.” (These and other translations in this article are mine.)

The description of this software – ActivePaper Archive – states that it will enable publishers to “Preserve, Web-enable, and Monetize [their] Archive Content Assets” [10]. So even if Le Temps does not actually intend to “monetize” their predecessor’s assets, the operation is still influenced by the monetizing purpose of the software they chose. Hence the hiding of the text versions on which the search engine works and the digital restriction on saving articles still under copyright.

Accessibility Issues

This ActivePaper Archive solution clearly poses great problems for blind people who have to use a screen reader to access content: screen readers read text, not images.

Le Temps is aware of this: in an e-mail answer (Jan. 8, 2009) to questions about copyright and accessibility problems in the archives of JdG, Ms Marie-Jeanne Escure, in charge of reproduction authorizations at Le Temps, wrote, “Nous avons un partenariat avec la Fédération suisse des aveugles pour la consultation des archives du Temps par les aveugles. Nous sommes très sensibilisés par cette cause et la mise à disposition des archives du Journal de Genève aux aveugles fait partie de nos projets.” Translation: “We have a partnership with the Swiss federation of blind people (see [11]) for the consultation of the archives of Le Temps by blind people. We are strongly committed/sensitive to this cause, and the offer of the archives of Journal de Genève to blind people is part of our projects.”

What Digital Copyright Protection, Anyway?

Gabriele Ghirlanda, member of Unitas [12], the Swiss Italian section of the Federation of Blind people, tried the Archives of JdG. He says (e-mail, Jan. 15, 2009):

With a screenshot, the image definition was too low for ABBYY FineReader 8.0 Professional Edition [optical character recognition software] to extract a meaningful text.

But by chance, I noticed that the article presented is made of several blocs of images, for the title and for each column.

Right-clic, copy image, paste in OpenOffice; export as PDF; then I put the PDf through Abbyy Fine Reader. […]

For a sighted person, it is no problem to create a document of good quality for each article, keeping it in image format, without having to go through OpenOffice and/or pdf. [my emphasis]

From the source code of the article used by Gabriele Ghirlanda: in red, the image files he mentions.

Unhide That Hidden Text, Please

Le Temps‘ commitment to the cause of accessibility for all and, in particular, to find a way to make the JdG archives accessible to blind people (see “Accessibility Issues” above) is laudable. But in this case, why first go through the complex process of splitting the text into several images, and theoretically prevent the download of some of these images for copyrighted texts, when this “digital copyright protection” can easily be by-passed with right-click and copy-paste?

As there already is a hidden text version of the JdG articles for powering the search engine, why not just unhide it? www.letempsarchives.ch already states that these archives are “© 2008 Le Temps SA.” This should be sufficient copyright protection.

Let’s hope that Olive ActivePaper Archive software offers this option to unhide hidden text. Not just for the archives of the JdG, but for all archives working with this software. And let’s hope, in general, that all web applications using text to enhance a non-text object will publish it. All published works are automatically protected by copyright laws anyway.

Adding an alternative accessible version just for blind people is discriminatory. According to accessibility guidelines – and common sense – alternative access for people with disabilities should only be used when there is no other way to make web content accessible. Besides, access to the text version would also simplify life for scholars – and for people using portable devices with a small screen: text can be resized far better than a puzzle of images with fixed width and height (see the source code excerpt above).

Links
The pages linked to in this article and a few more resources are bookmarked under http://www.diigo.com/user/calmansi/hiddentext

Filed under: Uncategorized | Tagged: ABBYY FineReader 8.0, ActivePaper, Almansi, Ari Bezman, Arnaud Sahuguet, Claude, Digital Copyright Protection, DotSUB, Ellen Wallace, Financial Times, Gabriele Ghirlanda, GenevaLunch, Google, Google books, Google Elections Video Search, Google for Educators 2008, hidden text, In Their Own Words, Indian Times, JdG, Journal de Genève, Le Temps, Lee LeFever, Marie-Jeanne Escure, Multilingual Captioning, Nicolas Dufour, object-enhancing applications, Olive Software, Phishing Scams in Plain English, Please, Political Videos Meet Google Speech-to-text Technology, Professional Edition, Scotsman, screen reader, Screenshot, Swiss, Switch, Switzerland, Terrence O'Brien, The European Library, transcript, Unhide That Hidden Text, Unitas, Vademecum, video captions, YouTube |

« Adventures in Hybrid Teaching: The First Day Is the Hardest The Campus: The Old Imperialism? »

Claude Almansi, on June 26, 2009 at 11:33 pm said:

About Gabriele Ghirlanda’s work-around quoted at the end of the post: the webmasters of the Journal de Genève’s archives have now added an empty gif on top of the text images: an old and very childish attempt to prevent copy-pasting or saving images by right-clicking. It does not work because you can still get them by saving the page as whole web page, then fish out the relevant image files from the folder of “non text objects”

So much for Le Temps’ promise to make the JdG archives accessible to blind people.

Claude Almansi, on November 19, 2009 at 12:12 pm said:

Eating (part of) my hat with great pleasure: See Automatic Captions in YouTube. Official Google Blog, Nov 19, 2009. With embedded explanatory video: captioned of course, and with a second player for the sign language version.

Google not only is making its voice-to-text available for automatic captioning (of English YouTube Education videos so far). But it also offers the possibility to automatically time-code a plain text transcript through the same voice-to-text tech, which will automatically find where/when what text should go. And it is possible to download the timecoded transcript to re-use it elsewhere.

Big thanks to the people of Google’s Accessibility project (see google.com/accessibility. However the “unhide that hidden text” request still obtains in the other cases mentioned in the above post

Claude Almansi, on June 19, 2010 at 12:05 pm said:

Since the beginning of June 2010, when you add captions to a YouTube video, they automatically produce an interactive transcript: see YouTube’;s Interactive Transcripts (Google Operating System – Unofficial news and tips about Google. 2010-06-04), thus enabling users to find and refer to an exact point in a video.

Example: Gespraech mit Poto Wegener SUISA (in D = GEMA) , a conversation in German between Poto Wegener of the Swiss collecting society SUISA, and Roger Lévy. The video is subtitled in German and French, and the interactive transcript is in German.

Say X has been told that Poto Wegener says important things about the difference between US and EU (and Swiss) copyright from 2:03, but X knows neither German nor French, only – say – English.

X can:
– have the subtitles automatically translated into English
– pause the video
– navigate to 2:03 in the interactive transcript
and then either restart the video from there, or move from caption to caption.

In this case, the automatic translation into English works better from the French than from the German subtitles, because the English sentence structure is closer to the French than to the German syntax.

This automatic translation is done with Google Language tools. Thus it works from and into:
Albanian
Arabic
Belarusian
Bulgarian
Catalan
Chinese
Croatian
Czech
Danish
Dutch
English
Estonian
Filipino
Finnish
French
Galician
German
Greek
Haitian Creole
Hebrew
Hindi
Hungarian
Icelandic
Indonesian
Irish
Italian
Japanese
Korean
Latvia
Lithuanian
Macedonian
Malay
Maltese
Norwegian
Persian
Polish
Portuguese
Romanian
Russian
Serbian
Slovak
Slovenian
Spanish
Swahili
Swedish
Thai
Turkish
Ukrainian
Vietnames
Welsh
Yiddish

(so far)

Now if one can afford the professional production of a 12 minute educational video in English, with versions dubbed in Arabic, Chinese, French, Russian and Spanish and uploads all 6 versions to YouTube, it would make sense to caption them at least in English in order to make it also available in all the other languages covered by automatic translation – considering that professionally produced videos already have a text script.

But it also makes sense for non scripted videos, like Gespraech mit Poto Wegener SUISA (in D = GEMA), used in the example above.

	China AI Robots Lead… on The Widening Gap: China’…
	Beyond the Screen: B… on AI and the Promise of Educatio…
	JimS on The American Dream in the AI E…
	Aivi beph on The American Dream in the AI E…
	Open Book Thinking U… on How AI Is Transforming Re…
	Choose the Best Thou… on The Human Side of AI Bias
	GPT’s Fake Mul… on Cross-Lingual Chatbotting in t…
	JimS on Musk’s Million People on…
	HarryKeller on Musk’s Million People on…
	JimS on Musk’s Million People on…
	HarryKeller on Musk’s Million People on…
	Governança da Inteli… on AI in June 2026: Three Critica…
	María Corina Machado… on Who Is María Corina Macha…
	Jiera Lefrisia on From AI ‘Prompts’…
	Jiera Lefrisia on Are Amodei’s Medical Pre…

Educational Technology and Change Journal

Recent Posts

Recent Comments

To Comment

Archives

Categories

Creative Commons License

Educational Technology and Change Journal

Recent Posts

Recent Comments

To Comment

Archives

Categories

Unhide That Hidden Text, Please

Share this:

Related

3 Responses

Leave a comment Cancel reply

Creative Commons License