Advanced Computer Assisted Translation

Advanced Computer Assisted Translation – Spring 2019

This semester I was able to undertake a NMT training project (neural machine translation) where as part of a team, we trained a machine translation engine using the Microsoft custom translator portal. A number of skills in Trados were also explored including adding special filters for translating files and finding translation errors before importing training data. Our final deliverable was taking the translation engine through three rounds of testing and comparing a final translated product to that of the test rounds of translations.

our training rounds on MS translator with corresponding scores

Our training project took various data sets through three phases: training, tuning and testing. The resulting training phases produced a BELU score which was used to gauge effective training.              

Outline of training phases and data used for each phase

I have provided some links here to our deliverables that explain project goals and findings in more detail.

One of the other cool things I learned in this class was how identifying frequent errors in translation can be filtered out using Okapi CheckMate and Xbench using regular expressions. We used the site regex101.com to help create and debug our expressions to find the most optimal way of searching and capturing specific strings that might get caught in CAT tools when sent to be translated. Here are a couple examples of how to filter email addresses using RegEx:

To capture only an email that would appear like name@organization.domain you can use [\w.]+@[\w.]+

However, if the format of emails is varied and could contain any combination of letters and numbers, use this expression instead
[\.\-a-zA-Z0-9]+@[a-zA-Z]+\.[a-z]+

I also got practice using many useful technologies such as screen recording software, file parsers, and a variety of other utilities that are useful to have under one’s tool belt for work in the localization industry. One of the utilities I found most useful was the PDF password remover. I have a demo video briefly showing how the trial program is used to remove a password from a PDF. In Jost Zetzsche’s book The Translator’s Tool Box he describes the need for a password remover as necessary at times when Word, Zip or PDF files get sent without knowing they have been encrypted and locked. In these cases having a remover is legal and necessary but unfortunately there are not any free programs that do the process reliably well. The one I used offers a trial that will only unlock the first page of your file. Jost recommends the password recovery security suite offered by ElcomSoft which has a longstanding reputation for excellent security software that comes with a robust set of password cracking utilities and methods.

Computer Assisted Translation

This semester one of the most important things I learned was about computer assisted translation (CAT) where we learned how to create and manage translation projects using CAT software tools. A final project was completed in a small group to take a translation project through one of those tools as a small team of translators.

The first part of this project involved writing a statement of work proposal, which can be viewed here. One of the things we didn’t take into consideration was the professional look of how we constructed the proposal. It seemed to look more like a homework assignment rather than something I would send to a client. Crafting a professional looking proposal is an integral part of a successful project and is something we will take into consideration in the future; we also discussed this in our debrief video which I will link to below.

Here is a link to the final deliverables for the project:

https://drive.google.com/open?id=1G6vgDRPoQ3fDhnuuC7wlMEygq6jh7Oxc

 

Here is a link to our team debrief video where we talk about lessons learned: