
Introduction
In this mini-portfolio, I will detail all of the projects I undertook and tools I learned in my 2019 Advanced CAT class at the Middlebury Institute of International Studies.. Throughout the first part of the semester, my classmates and I worked on training a statistical machine translation (SMT) engine using Microsoft Custom Translator in order to further understand machine translation.
SMT Training Project Using Microsoft Custom Translator
My team’s project aimed to gauge the investment required to train a custom statistical machine translation (SMT) engine to translate our client’s, the U.S Department of State’s, yearly International Religious Freedom reports from English to Spanish. Over the course of the semester weeks, we used the Microsoft Custom Translator tool to build a customized engine through 7 iterative training rounds. Our goal was that the post-edited machine translations (PEMT) from this engine would meet the following criteria: Efficiency: PEMT approximately 30% faster than human translation, Cost: PEMT approximately 30% less expensive than human translation, and Quality: PEMT that passes a customized “QA” check based on a customized model adapted from the LISA QA model. In other words we undertook a very ambitious project and set very high standards for the overall engine.

In order to assess the quality of the translation, we came up with a LISA derived metric system, as shown below, that included: accuracy, fluency, terminology, style, and formatting.

The project did not take off immediately because we had to manually align all of the bilingual PDF files in Trados in order to reach the necessary minimum word count of 10,000 words required to train the engine. However, over the span of six weeks, we successfully performed 7 cycles which you can see outlined below.

The BLEU score only increased once, which makes it impossible to create any kind of predictive calculated model for full engine training for the client, however, despite this small change in BLEU score we managed to go above and beyond our initial time & cost goals (30% more cost-efficient & 30% more time).
For more details and final recommendations to the client, I have provided links to the my team’s Pilot Project Proposal, Updated Project Proposal, and Post-Mortem Presentation.
Tips for Trados QA (using Regex)
In addition to creating our own statistical machine translation engine, another topic we focused on was using Regex for Trados QA. Studio 2019 introduces a new feature which allows the users to customize the settings and regular expression rules in correspondence with the languages one is working with. In other words, this feature will increase your overall control over quality assurance checks because you have various language-based regular expressions that help you clean up your document and perform more efficient quality checks.

Utility Demo/Training Video
Lastly, as the semester came to an end, we each took it upon ourselves to learn one new management, organization, or planning tool and additionally create an instructional video analyzing its importance in the the world of Localization. Since I love photography and am very interested in Desktop Publishing (DTP) I chose to do my instructional video on XnView: Image View, Photo Resizer & Graphic Converter — a unique software developed by XnSoft. The tool’s general purpose is to facilitate file management, in addition to offering a wide array of incredible tools such as built-in hex inspection, batch renaming, and screen capture tools. For more information, please feel free to check out my instructional video below!
