Collaborative Text Translation with DotSUB

Claude AlmansiBy Claude Almansi
Editor, Accessibility Issues

In a discussion about Uwe Müller’s dissertation regarding open access journals (see abstract with download link) on the A2k (access to knowledge) mailing-list, Arif Jinha wrote that it would be great to translate it collaboratively into English. Great idea, especially for a  269-page long  dissertation.

The way Arif Jinha intends to  collaboratively translate scholarly texts is based on the hypothesis that if two specialists thoroughly know each other’s subject, specialist B, even if he does not know specialist A’s language, is able to better understand – and render in own his language – specialist A’s work on the basis of even a dubious computer translation than would a generic translator who masters both languages. However, generic bilingual translators could be of use for checking possible mistakes in details.

This is very true. For instance, the best translation of a poem by Seferis into French was done by the French poet Yves Bonnefoy – who didn’t know Greek – on the basis of several English translations, in collaboration with Seferis who told him what he liked and disliked in these translations. And the same possibly extends to other fields of specialisation.

Collaborative Text Translation Tools

However, being just a generic translator, I have to  translate the other way round, from the small end as it were. So Arif Jinha’s suggestion got me thinking about collaborative translation tools. There are such tools for software, like Pootle, for instance, which split the interface into short strings presented in a table: a volunteer starts translating some, then another volunteer goes on. You can navigate by untranslated and “fuzzy” strings.

  • Problem 1: the strings are presented by alphabetical order, with only some coded indications of where the strings come from, and it takes some time to start understanding them. And one-word strings can be tricky: is “post” a noun or a verb, and if a verb, should we use the infinitive or the imperative, and if the imperative, the polite or the familiar form (in languages where both exist)?
  • Problem 2, you need a server on which to install this kind of tool.

Collaborative Text Translation with DotSUB

And then I remembered DotSUB. It is normally used for collaboratively captioning videos, but its  interface is very similar to one of the software translation tools that I covered in Three Video Captioning Tools. And you can have longer strings, in the order you decide – in the order of a text too…

But I needed a video pre-text first. So I made one, inserting a 4k black JPEG file in a video editor:

Black JPEG file

I timed it for 10 minutes and exported the video in the lowest possible resolution. Then I uploaded it into DotSUB and inserted some text from my blog post Making Web Multimedia Accessible Needn’t Be Boring, sentence by sentence:

Dotsub Transcription Tool:

video player left; things already transcribed top right; box for transcribing bottom left

I left the default 3-second timing for each string in the “Add a transcription line” box and paid no attention to the pre-text black video. Each transcribed string moves to the top-right table when you hit return and is automatically saved. When that was done, I clicked on “Mark this transcription complete” (bottom left) and moved to the DotSUB Translation Tool.

DotSUB Translation Tool

The transcription is in a tabled list, with each item followed by a link you can click to translate it

I clicked on the links to translate each string (actually,  I only translated the text into French, but I forgot to make a screenshot, first, so I made one of the interface for translating into Italian instead).

When you choose a language for the captions in the video player of the resulting Collaborative translation DotSUB page, you get the translation in the corresponding language as a drop-down list under Video Transcription. To get rid of the list markings, just copy-paste it into the “source” or “html view” of a web editor. Here is the almost unedited result (I just redid a separate paragraph for the subtitle and bolded it, and I put the rest in italics) :

Certains pensent que l’obligation légale de se conformer aux règles d’accessibilité des contenus Web – celles du W3C ou, aux USA, la “section 508” mène forcément à des pages ennuyeuses, rien qu’en texte En fait, ces règles n’excluent pas l’utilisation du multimédia sur le web, mais imposent de le rendre accessible en “offrant des alternatives équivalentes pour des contenus auditifs ou visuels et en particulier: “Pour toute présentation multimédia à base temporelle (p. ex. film ou animation), il faut offrir des alternatives équivalentes (p.ex. sous-titres ou descriptions audios de la piste visuelle) avec la présentation [Priorité 1]” [1] Ce n’est pas une corvée aussi terrible qu’il ne semble, et elle peut être partagée entre plusieurs personnes, même si elles ne sont pas expertes en technologie et n’ont pas d’instruments perfectionnés.

Sous-titrage avec

Exemple: Phishing Scams in Plain English de Lee LeFever, en  [2] Ici, la vidéo a été téléchargée dans, et plusieurs volontaires l’ont sous-titrée en diverses langues. Le résultat peut être insérer dans un blog, un wiki ou une page web. Les sous-titres apparaissent aussi comme texte copiable sous “Video Transcription”: commode si des gens veulent citer des passages dans une discussion de la vidéo. En outre, une transcription d’une vidéo tend aussi à améliorer sa position dans les moteurs de recherche, qui indexent principalement les textes. Le seul problème est que les sous-titres couvrent une partie substantielle de la vidéo

Summing up so far:

Of course, I attempted this alone. But it would also work with several people collaborating in the translation. In theory, even the transcription, sentence by sentence, of the original text could be shared, but I haven’t checked yet if a collaborator could decree that a transcription is finished when it isn’t, thus blocking the transcription.

In case of a longish text that must be translated into several languages (hopefully in collaboration with many people), this way of using DotSUB might prove useful due to the ease of toggling between the different versions from the main page.

Unhide That Hidden Text, Please

claude80By Claude Almansi
Staff Writer

Thanks to:

  • Marie-Jeanne Escure, of Le Temps, for having kindly answered questions about copyright and accessibility issues in the archives of the Journal de Genève.
  • Gabriele Ghirlanda, of Unitas, for having tested the archives of the Journal de Genève with a screen reader.

What Hidden Text?

Here, “hidden text” refers to a text file combined by an application with another object (image, video etc.) in order to add functionality to that object: several web applications offer this text to the reader together with the object it enhances – DotSUB offers the transcript of video captions, for instance:


Screenshot from “Phishing Scams in Plain English” by Lee LeFever [1].

But in other applications, unfortunately, you get only the enhanced object, but the text enhancing it remains hidden even though it would grant access to content for people with disabilities that prevent them from using the object and would simplify enormously research and quotations for everybody.

Following are three examples of object-enhancing applications using text but keeping it hidden:

Multilingual Captioning of YouTube and Google Videos

Google offers the possibility to caption a video by uploading one or several text files with their timed transcriptions. See the YouTube example below.

yt_subtYouTube video captioning.

Google even automatically translates the produced captions into other languages, at the user’s discretion. See the example below. (See “How to Automatically Translate Foreign-Language YouTube Videos” by Terrence O’Brien, Switch,

yt_subt_trslOption to automatically translate the captions of a YouTube video.

Nov. 3, 2008 [2], from which the above two screenshots were taken.) But the text files of the original captions and their automatic translations remain hidden.

Google’s Search Engine for the US Presidential Campaign Videos

During the 2008 US presidential campaign, Google beta-tested a search engine for videos on the candidates’ speeches. This search engine works on a text file produced by speech-to-text technology. See the example below.

google_election_searchGoogle search engine for the US presidential election videos.

(See “Google Elections Video Search,” Google for Educators 2008 – where you can try the search engine in the above screenshot – [3] and “‘In Their Own Words’: Political Videos Meet Google Speech-to-text Technology” by Arnaud Sahuguet and Ari Bezman. Official Google blog, July 14, 2008 [4].) But here, too, the text files on which the search engine works remain hidden.

Enhanced Text Images in Online Archives

Maybe the oddest use of hidden text is when people go to the trouble of scanning printed texts, produce both images of text and real text files from the scan, then use the text file to make the image version searchable – but hide it. It happens with Google books [5] and with The European Library [6]: you can browse and search the online texts that appear as images thanks to the hidden text version, but you can’t print them or digitally copy-paste a given passage – except if the original is in the public domain: in this case, both make a real textual version available.

Therefore, using a plain text file to enhance an image of the same content, but hiding the plain text, is apparently just a way to protect copyrighted material. And this can lead to really bizarre solutions.

Olive Software ActivePaper and the Archives of Journal de Genève

On December 12, 2008, the Swiss daily Le Temps announced that for the first time in Switzerland, they were offering online “free access” to the full archives – (English version at [7]) – of Le Journal de Genève (JdG), which, together with two other dailies, got merged into Le Temps in 1998. In English, see Ellen Wallace’s “Journal de Geneve Is First Free Online Newspaper (but It’s Dead),” GenevaLunch, Dec. 12, 2008 [8].

A Vademecum to the archives, available at [9] (7.7 Mb PDF), explains that “articles in the public domain can be saved as


images. Other articles will only be partially copied on the hard disk,” and Nicolas Dufour’s description of the archiving process in the same Vademecum gives a first clue about the reason for this oddity: “For the optical character recognition that enables searching by keywords within the text, the American company Olive Software adapted its software which had already been used by the Financial Times, the Scotsman and the Indian Times.” (These and other translations in this article are mine.)

The description of this software – ActivePaper Archive – states that it will enable publishers to “Preserve, Web-enable, and Monetize [their] Archive Content Assets” [10]. So even if Le Temps does not actually intend to “monetize” their predecessor’s assets, the operation is still influenced by the monetizing purpose of the software they chose. Hence the hiding of the text versions on which the search engine works and the digital restriction on saving articles still under copyright.

Accessibility Issues

This ActivePaper Archive solution clearly poses great problems for blind people who have to use a screen reader to access content: screen readers read text, not images.

Le Temps is aware of this: in an e-mail answer (Jan. 8, 2009) to questions about copyright and accessibility problems in the archives of JdG, Ms Marie-Jeanne Escure, in charge of reproduction authorizations at Le Temps, wrote, “Nous avons un partenariat avec la Fédération suisse des aveugles pour la consultation des archives du Temps par les aveugles. Nous sommes très sensibilisés par cette cause et la mise à disposition des archives du Journal de Genève aux aveugles fait partie de nos projets.” Translation: “We have a partnership with the Swiss federation of blind people (see [11]) for the consultation of the archives of Le Temps by blind people. We are strongly committed/sensitive to this cause, and the offer of the archives of Journal de Genève to blind people is part of our projects.”

What Digital Copyright Protection, Anyway?

Gabriele Ghirlanda, member of Unitas [12], the Swiss Italian section of the Federation of Blind people, tried the Archives of JdG. He says (e-mail, Jan. 15, 2009):

With a screenshot, the image definition was too low for ABBYY FineReader 8.0 Professional Edition [optical character recognition software] to extract a meaningful text.

But by chance, I noticed that the article presented is made of several blocs of images, for the title and for each column.

Right-clic, copy image, paste in OpenOffice; export as PDF; then I put the PDf through Abbyy Fine Reader. […]

For a sighted person, it is no problem to create a document of good quality for each article, keeping it in image format, without having to go through OpenOffice and/or pdf. [my emphasis]

<DIV style=”position:relative;display:block;top:0; left:0; height:521; width:1052″ xmlns:OliveXLib=”; xmlns:OlvScript=”; xmlns:msxsl=”urn:schemas-microsoft-com:xslt”><div id=”primImg” style=”position:absolute;top:30;left:10;” z-index=”2″><img id=”articlePicture” src=”/Repository/getimage.dll?path=JDG/1990/03/15/13/Img/Ar0130200.png” border=”0″></img></div><div id=”primImg” style=”position:absolute;top:86;left:5;” z-index=”2″><img id=”articlePicture” src=”/Repository/getimage.dll?path=JDG/1990/03/15/13/Img/Ar0130201.png” border=”0″></img></div><div id=”primImg” style=”position:absolute;top:83;left:365;” z-index=”2″><img id=”articlePicture” src=”/Repository/getimage.dll?path=JDG/1990/03/15/13/Img/Ar0130202.png” border=”0″></img></div><div id=”primImg” style=”position:absolute;top:521;left:369;” z-index=”2″><img id=”articlePicture” src=”/Repository/getimage.dll?path=JDG/1990/03/15/13/Img/Ar0130203.png” border=”0″></img></div><div id=”primImg” style=”position:absolute;top:81;left:719;” z-index=”2″><img id=”articlePicture” src=”/Repository/getimage.dll?path=JDG/1990/03/15/13/Img/Ar0130204.png” border=”0″></img></div>

From the source code of the article used by Gabriele Ghirlanda: in red, the image files he mentions.

Unhide That Hidden Text, Please

Le Temps‘ commitment to the cause of accessibility for all and, in particular, to find a way to make the JdG archives accessible to blind people (see “Accessibility Issues” above) is laudable. But in this case, why first go through the complex process of splitting the text into several images, and theoretically prevent the download of some of these images for copyrighted texts, when this “digital copyright protection” can easily be by-passed with right-click and copy-paste?

As there already is a hidden text version of the JdG articles for powering the search engine, why not just unhide it? already states that these archives are “© 2008 Le Temps SA.” This should be sufficient copyright protection.

Let’s hope that Olive ActivePaper Archive software offers this option to unhide hidden text. Not just for the archives of the JdG, but for all archives working with this software. And let’s hope, in general, that all web applications using text to enhance a non-text object will publish it. All published works are automatically protected by copyright laws anyway.

Adding an alternative accessible version just for blind people is discriminatory. According to accessibility guidelines – and common sense – alternative access for people with disabilities should only be used when there is no other way to make web content accessible. Besides, access to the text version would also simplify life for scholars – and for people using portable devices with a small screen: text can be resized far better than a puzzle of images with fixed width and height (see the source code excerpt above).

The pages linked to in this article and a few more resources are bookmarked under

Three Video Captioning Tools

claude80By Claude Almansi
Staff Writer

First of all, thanks to:

  • Jim Shimabukuro for having encouraged me to further examine captioning tools after my previous Making Web Multimedia Accessible Needn’t Be Boring post – this has been a great learning experience for me, Jim
  • Michael Smolens, founder and CEO of and Max Rozenoer, administrator of, for their permission to use screenshots of Overstream and DotSUB captioning windows, and for their answers to my questions.
  • Roberto Ellero and Alessio Cartocci of the project for their long patience in explaining multimedia accessibility issues and solutions to me.
  • Gabriele Ghirlanda of for having tried the tools with a screen reader.

However, these persons are in no way responsible for possible mistakes in what follows.

Common Features

Video captioning tools are similar in many aspects: see the screenshot of a captioning window at DotSUB:


and at Overstream:


In both cases, there is a video player, a lst of captions and a box for writing new captions, with boxes for the start and end time of each caption. The MAGpie desktop captioning tool (downloadable from is similar: see the first screenshot in David Klein and K. “Fritz” Thompson, Captioning with MAGpie, 2007 [1].

Moreover, in all three cases, captions can be either written directly in the tool, or creating by importing a file where they are separated by a blank line – and they can be exported as a file too.

What follows is just a list of some differences that could influence your choice of a captioning tool.

Overstream and DotSUB vs MAGpie

  • DotSUB and Overstream are online tools (only a browser is needed to use them, whatever the OS of the computer), whereas MAGpie is a desktop application that works with Windows and Mac OS, but not with Linux.
  • DotSUB and Overstream use SubRip (SRT) captioning [2] while MAGpie uses Synchronized Multimedia Integration Language (SMIL) captioning [3]
  • Overstream and Dotsub host the captioned result online, MAGpie does not.
  • The preparation for captioning is less intuitive with MAGpie than with Overstream or DotSUB, but on the other hand MAGpie offers more options and produces simpler files.
  • MAGpie can be used by disabled people, in particular by blind and low-sighted people using a screen reader [4], whereas DotSUB and Overstream don’t work with a screen reader.

Overstream vs DotSUB

  • The original video can be hosted at DotSUB; with Overstream, it must be hosted elsewhere.
  • DotSUB can also be used with a video hosted elsewhere, but you must link to the streaming flash .flv file, whereas with Overstream, you can link to the page of the video – but Overstream does not support all video hosting platforms.
  • If the captions are first written elsewhere then imported as an .srt file, Overstream is more tolerant of coding mistakes than DotSUB – but this cuts both ways: some people might prefer to have your file rejected rather than having gaps in the captions.
  • Overstream allows more precise time-coding than DotSUB, and it also has a “zooming feature” (very useful for longish videos), which DotSUB doesn’t have.
  • DotSUB can be used as a collaborative tool, whereas Overstream cannot yet: but Overstream administrators are planning to make it possible in future.
  • With DotSUB, you can have switchable captions in different languages on one player. With Overstream, there can only be one series of captions in a given player.

How to Choose a Tool . . .

So how to choose a tool? As with knitting, first make a sample with a short video using different tools: the short descriptive lists above cannot replace experience. Then choose the most appropriate one according to your aims for captioning a given video, and what are your possible collaborators’ availability, IT resources, and abilities.

. . . Or Combine Tools

The great thing with these tools is that you can combine them:

As mentioned in my former Making Web Multimedia Accessible Needn’t Be Boring post, I had started captioning “Missing in Pakistan” a year ago on DotSUB, but gone on using MAGpie for SMIL captioning (see result at [5] ). But when Jim Shimabukuro suggested this presentation of captioning tools, I found my aborted attempt at DotSUB. As you can also do the captioning there by importing a .srt file, I tried to transform my “.txt for SMIL” file of the English captions into a .srt file. I bungled part of the code, so DotSUB refused the file. Overstream accepted it, and I corrected the mistakes using both. Results at [6] (DotSUB) and [7] (Overstream) . And now that I have a decent .srt file for the English transcript, I could also use it to caption the video at YouTube or Google video: see YouTube’s “Video Captions: Help with Captions” [8]. (Actually, there is a freeware program called Subtitle Workshop [9] that could apparently do this conversion cleanly, but it is Windows-only and I have a Mac.)

This combining of tools could be useful even for less blundering people. Say one person in a project has better listening comprehension of the original language than the others, and prefers Overstream: s/he could make the first transcript there, export the .srt file, which then could be mported in DotSUB to produce a transcript that all the others could use to make switchable captions in other languages. If that person with better listening comprehension were blind, s/he might use MAGpie to do the transcript, and s/he or someone else could convert it to a .srt fil that could then be uploaded either to DotSUB or Overstream. And so on.

Watch Out for New Developments

I have only tried to give an idea of three captioning tools I happen to be acquainted with, as correctly as I could. The complexity of making videos accessible and in particular of the numerous captioning solutions is illustrated in the Accessibility/Video Accessibility section [10] of the Mozilla wiki – and my understanding of tech issues remains very limited.

Moreover, these tools are continuously progressing. Some have disappeared – Mojiti, for instance – and other ones will probably appear. So watch out for new developments.

For instance, maybe Google will make available the speech-to-text tool that underlies its search engine for the YouTube videos of the candidates to the US presidential elections (see “”In their own words”: political videos meet Google speech-to-text technology” [11]): transcribing remains the heavy part of captioning and an efficient, preferably online speech-to-text tool would be an enormous help.

And hopefully, there will soon be an online, browser-based and accessible SMIL generating tool. SubRip is great, but with SMIL, captions stay put under the video instead of invading it, and thus you can make longer captions, which simplifies the transcription work. Moreover, SMIL is more than just a captioning solution: the SMIL “hub” file can also coordinate a second video for sign language translation, and audio descriptions. Finally, SMIL is a W3C standard, and this means that when the standard gets upgraded, it still “degrades gracefully” and the full information is available to all developers using it: see “Synchronized Multimedia Integration Language (SMIL 3.0) – W3C Recommendation 01 December 2008 [12].