Throughout the course Advanced Computer-assisted Translation, I have the opportunity to explore a wide range of technology and tools used in the world of localization, including Machine Translation, advanced QA settings in Trados, and project management platforms such as monday.com and Podio. In this article, I will illustrate the things I have learned from building a Statistical Machine Translation (SMT) model with Microsoft Custom Translator, applying regular expression in QA, and filming an introductory video for a project management tool called Podio.
As machine translation, especially Neural Machine Translation, has become one of the most important trends in the localization industry, it is of paramount importance for localizers to gain more knowledge in the development and the application of MT. In order to gain more hands-on experience in managing MT projects and building MT models, we teamed up as a group of five and started working on a proposal for a client who wanted to incorporate customized MT in their translation projects. We chose China Academy of Translation as our client, since we would love to explore how machine translation performs on political speeches given by Chinese officials. In order to keep the project in scope, we narrowed it down to the Chinese-English translation of Premier Li Keqiang’s speeches.
At the first stage of the project, we drafted a preliminary proposal for our client, which listed out how we plan to build the customized SMT model with the datasets we found on China Academy of Translation’s website, the criteria we will use, the budget and the timeline for the project, and the projected outcome. You can find our preliminary proposal right here.
After our client approved our initial proposal, we started working on the pilot project by building the SMT model in Microsoft Custom Translator. Basically, we employed the bilingual transcripts of speeches given by Premier Li as our training and tuning data. For testing data, we used two Chinese transcripts of Premier Li’s speeches. However, we had a hard time putting together the datasets used in model training. This is because, while the materials we used were already aligned by paragraphs, SMT model performs better when the datasets are aligned by sentences. Hence, we spent a bulk of our time aligning the bilingual materials manually, which was a major pain point for our team. Another problem with our project is that, as SMT model works best when the training data are consist of short sentences with simple structures, the sentences in political speeches are way too lengthy and complicated. Therefore, it is more difficult for the engine to identify and learn the patterns of the sentences. As a result, although the translation produced by the SMT model we built was better than we expected, it didn’t reach the minimum QA standard we set beforehand. In addition, the training of the customized model was entirely out of scope in terms of budget and time. Hence, in our updated proposal and presentation for the client, we advised against using customized SMT to translate political speeches.
During the second half of the semester, we explored more functionality of QA settings in Trados. By using regular expressions, we can create rules that help us identify the language-specific formatting or grammatical mistakes. For instance, when we are performing QA on Chinese translations, we can create a regular expression that captures the dates formatting in MM/DD/YYYY and change them into YYYY/MM/DD, which is the standard date format in Chinese. The ability to employ regular expressions to create customized QA settings can significantly enhance the QA testers’ work efficiency. Furthermore, using language-specific rules in QA process has a massive impact on improving the overall quality of translation projects, especially when the reviewers are proofreading texts written in a language they can’t read. In short, having knowledge in regular expression is definitely a plus for localizers.
At the end of this article, I would like to briefly talk about a great project management tool called Podio. It is very suitable for project management in localization, as it makes it easy for PMs to create projects using templates and assign tasks with different deadlines to team members. First of all, Podio has lots of built-in templates for different types of project, and the templates are highly customizable. PMs can modify a basic template swiftly and save it for future use. Therefore, it is extremely easy for PMs to create a new project. Secondly, projects can be broken down to assignable pieces for different team members. This is extremely convenient because the team members can easily view the parts of the project they are assigned to and the deadlines for those mini projects. In addition, the team members can also schedule and launch meetings on Podio. If you are interested in learning more about the basic functions of Podio, here is an introductory video I made for this powerful tool.