Collaborative Text Translation with DotSUB

Claude AlmansiBy Claude Almansi
Editor, Accessibility Issues

In a discussion about Uwe Müller’s dissertation regarding open access journals (see abstract with download link) on the A2k (access to knowledge) mailing-list, Arif Jinha wrote that it would be great to translate it collaboratively into English. Great idea, especially for a  269-page long  dissertation.

The way Arif Jinha intends to  collaboratively translate scholarly texts is based on the hypothesis that if two specialists thoroughly know each other’s subject, specialist B, even if he does not know specialist A’s language, is able to better understand – and render in own his language – specialist A’s work on the basis of even a dubious computer translation than would a generic translator who masters both languages. However, generic bilingual translators could be of use for checking possible mistakes in details.

This is very true. For instance, the best translation of a poem by Seferis into French was done by the French poet Yves Bonnefoy – who didn’t know Greek – on the basis of several English translations, in collaboration with Seferis who told him what he liked and disliked in these translations. And the same possibly extends to other fields of specialisation.

Collaborative Text Translation Tools

However, being just a generic translator, I have to  translate the other way round, from the small end as it were. So Arif Jinha’s suggestion got me thinking about collaborative translation tools. There are such tools for software, like Pootle, for instance, which split the interface into short strings presented in a table: a volunteer starts translating some, then another volunteer goes on. You can navigate by untranslated and “fuzzy” strings.

  • Problem 1: the strings are presented by alphabetical order, with only some coded indications of where the strings come from, and it takes some time to start understanding them. And one-word strings can be tricky: is “post” a noun or a verb, and if a verb, should we use the infinitive or the imperative, and if the imperative, the polite or the familiar form (in languages where both exist)?
  • Problem 2, you need a server on which to install this kind of tool.

Collaborative Text Translation with DotSUB

And then I remembered DotSUB. It is normally used for collaboratively captioning videos, but its  interface is very similar to one of the software translation tools that I covered in Three Video Captioning Tools. And you can have longer strings, in the order you decide – in the order of a text too…

But I needed a video pre-text first. So I made one, inserting a 4k black JPEG file in a video editor:

Black JPEG file

I timed it for 10 minutes and exported the video in the lowest possible resolution. Then I uploaded it into DotSUB and inserted some text from my blog post Making Web Multimedia Accessible Needn’t Be Boring, sentence by sentence:

Dotsub Transcription Tool:

video player left; things already transcribed top right; box for transcribing bottom left

I left the default 3-second timing for each string in the “Add a transcription line” box and paid no attention to the pre-text black video. Each transcribed string moves to the top-right table when you hit return and is automatically saved. When that was done, I clicked on “Mark this transcription complete” (bottom left) and moved to the DotSUB Translation Tool.

DotSUB Translation Tool

The transcription is in a tabled list, with each item followed by a link you can click to translate it

I clicked on the links to translate each string (actually,  I only translated the text into French, but I forgot to make a screenshot, first, so I made one of the interface for translating into Italian instead).

When you choose a language for the captions in the video player of the resulting Collaborative translation DotSUB page, you get the translation in the corresponding language as a drop-down list under Video Transcription. To get rid of the list markings, just copy-paste it into the “source” or “html view” of a web editor. Here is the almost unedited result (I just redid a separate paragraph for the subtitle and bolded it, and I put the rest in italics) :

Certains pensent que l’obligation légale de se conformer aux règles d’accessibilité des contenus Web – celles du W3C ou, aux USA, la “section 508” mène forcément à des pages ennuyeuses, rien qu’en texte En fait, ces règles n’excluent pas l’utilisation du multimédia sur le web, mais imposent de le rendre accessible en “offrant des alternatives équivalentes pour des contenus auditifs ou visuels et en particulier: “Pour toute présentation multimédia à base temporelle (p. ex. film ou animation), il faut offrir des alternatives équivalentes (p.ex. sous-titres ou descriptions audios de la piste visuelle) avec la présentation [Priorité 1]” [1] Ce n’est pas une corvée aussi terrible qu’il ne semble, et elle peut être partagée entre plusieurs personnes, même si elles ne sont pas expertes en technologie et n’ont pas d’instruments perfectionnés.

Sous-titrage avec DotSUB.com

Exemple: Phishing Scams in Plain English de Lee LeFever, en http://dotsub.com/view/41ffcc22-6609-4780-bf9d-5bcf88d3197d  [2] Ici, la vidéo a été téléchargée dans DotSUB.com, et plusieurs volontaires l’ont sous-titrée en diverses langues. Le résultat peut être insérer dans un blog, un wiki ou une page web. Les sous-titres apparaissent aussi comme texte copiable sous “Video Transcription”: commode si des gens veulent citer des passages dans une discussion de la vidéo. En outre, une transcription d’une vidéo tend aussi à améliorer sa position dans les moteurs de recherche, qui indexent principalement les textes. Le seul problème est que les sous-titres couvrent une partie substantielle de la vidéo

Summing up so far:

Of course, I attempted this alone. But it would also work with several people collaborating in the translation. In theory, even the transcription, sentence by sentence, of the original text could be shared, but I haven’t checked yet if a collaborator could decree that a transcription is finished when it isn’t, thus blocking the transcription.

In case of a longish text that must be translated into several languages (hopefully in collaboration with many people), this way of using DotSUB might prove useful due to the ease of toggling between the different versions from the main page.

3 Responses

  1. That’s pretty neat. How do you expect that this would handle revisioning? I’ve often thought you’d want a language expert collaborating with a subject matter expert, revising each other’s translation in turn.

  2. Good observation: DotSUB does not really handle revisioning, though it is possible to export the “subtitles” (here the translation of a text) at any stage – which of course is a safeguard in case someone badly messes up, voluntarily or by mistake. But that’s not the same as the revision feature in a text treatment or the history of versions in a wiki.

    I should have added that I thought of hijacking DotSUB in this way for the first complete draft of translations, and then move the translations elsewhere for fine-tuning. One interesting platform for that would be David Lebow’s Hylighter: see hylighter-edu (1), where the comments appear as marginalia, and then get integrated by the document manager. This would work great for the collaboration between subject matter expert and language expert, I think.

    Other more obvious solution: a wiki. But the possibility to first use marginal comments for suggestions is appealing.

    (1) the “normal” hylighter.com would work just as fine, but the verbal explanations on hylighter-edu are more complete.

  3. Thanks for the link to Hylighter — I hadn’t seen it before. Reminds me a bit of CommentPress and eComma.

    One of the things I’ve been mulling over is the need to “fork” a translation for different target languages. Do you think that the best approach would be through multiple, independent translation projects, or to link/combine them in some way?

Leave a comment