By Claude Almansi
Editor, Accessibility Issues
ETCJ Associate Administrator
The ongoing – June 15 to 24, 2011 – 22nd session of the Standing Committee on Copyright and Related Rights (SCCR/22) of the World International Property Organization (WIPO) is addressing, once again, the problem and removal of copyright barriers to accessing knowledge and information by people who are blind, sight-impaired or have other print disabilities.
In fact, copyright laws are national and — so far — international treaties and legal instruments have systematically aimed at globally reinforcing prohibitions, and rich countries, upholding the position of the content industry, have always opposed globalization of copyright restrictions in favor of people with disabilities, alleging that if they were officially globalized by WIPO, this would lead to further restrictions in favor of other groups.
From David Hammerstein's “I just called to say I want to read” post about a former discussion of a WIPO Treaty for the Blind, Visually Impaired and People with Print Disabilities. Site of the IP Policy Committee of the The Transatlantic Consumer Dialogue (TACD). Sept. 25, 2010.
Non scientists should refrain from using scientific concepts as metaphors. I am fully aware of this, and actually, when a sociologist or other humanistic scholar thus hijacks terms or phrases like “black hole,” “big bang,” “DNA,” etc., I skip his/her text if possible.
Nevertheless, what little I understand of how the cellulase enzyme works for ruminants has been very instrumental in my first perception of how captioning videos helps all users digest their content, and underlies what I have written here so far about captioning. Hence the decision to come out explicitly with this subjective and uninformed perception of it.
Digesting grass
Cows can digest and assimilate the grass cellulose because they ruminate it, but not only: humans could chew and re-chew grass for hours and hours, yet they would still excrete its cellulose whole without assimilating any because we lack something cows have: the cellulase enzyme that chops up the molecules of cellulose into sugar types so that they can be assimilated Continue reading →
At the end of the July 13 discussion, the Ambassador of Yemen to the UN in Geneva remarked that people who could not read because they had had no opportunities to go to school should be included among “Reading Disabled Persons” and thus benefit from the same copyright restrictions in WBU‘s draft treaty, in particular, digital texts that can be read with Text-to-Speech (TTS) software.
The Ambassador of Yemen hit a crucial point.
TTS was first conceived as an important accessibility tool to grant blind people access to texts in digital form, cheaper to produce and distribute than heavy braille versions. Moreover, people who become blind after a certain age may have difficulties learning braille. Now its usefulness is being recognized for others who cannot read print because of severe dyslexia or motor disabilities.
Indeed, why not for people who cannot read print because they could not go to school?
What does “literacy” mean?
No one compos mentis who has seen/heard blind people use TTS to access texts and do things with these texts would question the fact that they are reading. Same if TTS is used by someone paralyzed from the neck down. What about a dyslexic person who knows the phonetic value of the signs of the alphabet, but has a neurological problem dealing with their combination in words? And what about someone who does not know the phonetic value of the signs of the alphabet?
Writing literacy
Sure, blind and dyslexic people can also write notes about what they read. People paralyzed from the neck down and people who don’t know how the alphabet works can’t, unless they can use Speech-to-Text (STT) technology.
Traditional desktop STT technology is too expensive – one of the most used solutions, Dragon NaturallySpeaking, starts at $99 – for people in poor countries with a high “illiteracy” rate. Besides, it has to be trained to recognize the speakers’ voice, which might not be an obvious thing to do for someone illiterate.
Free Speech-to-Text for all, soon?
In Unhide That Hidden Text, Please, back in January 2009, I wrote about Google’s search engine for the US presidential campaign videos, complaining that the text file powering it – produced by Google’s speech-to-text technology – was kept hidden.
To help address this challenge, we’ve combined Google’s automatic speech recognition (ASR) technology with the YouTube caption system to offer automatic captions, or auto-caps for short. Auto-caps use the same voice recognition algorithms in Google Voice to automatically generate captions for video.
As the video above says, the automatic captions are sometimes good, sometimes not so good – but better than nothing if you are deaf or don’t know the language. Therefore, when you switch on automatic captions in a video of one of the channels participating in the project, you get a warning:
Short words are the rub
English – the language for which Google presently offers automatic captioning – has a high proportion of one-syllable words, and this proportion is particularly high when the speaker is attempting to use simple English: OK for natives, but at times baffling for foreigners.
When I started studying English literature at university, we 1st-year students had to follow a course on John Donne’s poems. The professor had magnanimously announced that if we didn’t understand something, we could interrupt him and ask. But doing so in a big lecture hall with hundreds of listeners was rather intimidating. Still, once, when I noticed that the other students around me had stopped taking notes and looked as nonplussed as I was, I summoned my courage and blurted out: “Excuse me, but what do you mean exactly by ‘metaphysical pan’?” When the laughter subsided, the professor said he meant “pun,” not “pan,” and explained what a pun was.
If you switch on the automatic captions [2], there are over 10 different transcriptions – all wrong – for the 30+ occurrences of the word “rip.” The word is in the title (“Don’t get sucked in by the rip…”), it is explained in the video description (“Rip currents are the greatest hazards on our beaches.”), but STT software just attempts to recognize the audio. It can’t look around for other clues when the audio is ambiguous.
That’s what beta versions are for
Google deserves compliments for having chosen to semi-publicly beta test the software in spite of – but warning about – its glitches. Feedback both from the partners hosting the automatically captionable videos and from users should help them fine-tune the software.
A particularly precious contribution towards this fine-tuning comes from partners who also provide human-made captions, as in theOfficial MIT OpenCourseWare 1800 Event Video in the MIT YouTube channel:
Once this short word issue is solved for English, it should then be easier to apply the knowledge gained to other languages where they are less frequent.
I have done so with the Lessig at Educause: Creative Commons video, for which I had used another feature of the Google STT software: feeding it a plain transcript and letting it add the time codes to create the captions. The resulting caption .txt file I then downloaded says:
0:00:06.009,0:00:07.359
and think about what else we could
be doing.
0:00:07.359,0:00:11.500
So, the second thing we could be doing is
thinking about how to change norms, our norms,
0:00:11.500,0:00:15.670
our practices.
And that, of course, was the objective of
0:00:15.670,0:00:21.090
a project a bunch of us launched about 7 years
ago,the Creative Commons project. Creative
And soon, when Google opens this automated captioning to everyone, they will be able to say what they want to write in a YouTube video – which can be directly made with any web cam, or even cell phone cam – auto-caption it, then retrieve the caption text file.
True, to get a normal text, the time codes should be deleted and the line-breaks removed. But learning to do that should be way easier than learning to fully master the use of the alphabet.
Recapitulating:
Text-to-Speech, a tool first conceived to grant blind people access to written content, can also be used by other reading-disabled people, including people who can’t use the alphabet convention because they were unable to go to school and, thus, labeled “illiterate.”
Speech-to-Text, a tool first conceived to grant deaf people access to audio content, is about to become far more widely available and far easier to use than it was recently, thus potentially enabling people who can’t use the alphabet convention because they were unable to go to school and labeled “illiterate” the possibility to write.
This means that we should reflect on the meanings of the words “literate” and “illiterate.”
Now that technologies first meant to enable people with medically recognized disabilities to use and produce texts can also do the same for those who are “reading disabled” by lack of education, industries and nations presently opposed to the Treaty for Improved Access for Blind, Visually Impaired and other Reading Disabled Persons should start thinking beyond “strict copyright” and consider the new markets that this treaty would open up.