Intro
Following Intro to computer-assisted translation (CAT) class, I have taken Advanced CAT class this semester. As the course indicates, it was not easy — quite challenging even from the beginning. Expectations from Professor Wooten were high, and he has been pushing us to reach a higher level to master CAT tools. The intro class was more like learning basic skills to learn CAT tools, mainly Trados. But this advanced course had us explore a deeper level of Trados such as setting up filters for ‘translatable’ or ‘untranslatable’ segments to avoid translating codings and elements that are not supposed to be translated.
To be honest, the whole process was painful and harsh. Professor Wooten didn’t tell us which filters we had to set as ‘untranslatable’ not to translate codings on .xml file. It was seriously ‘No pain, No gain.’ However, after we finally found the filters to avoid translating codes on our own through countless efforts and trials, I am confident to say that I am comfortable using and filtering elements for setting ‘untranslatable’ segments.
Training a Machine Translation Engine with Microsoft Translator Hub
Around the middle of the semester, we started a group project to train a Statistical Machine Translation (SMT). For a couple of weeks, we learned how we were going to proceed this project and this machine translation engine was going to be trained. Also, we had to choose a topic and document style to train the engine. Me, Michelle Chae, and Emily Skalovsky determined to select ‘Rom-Com movie’ subtitles. We figured it will be a fun and interesting project. We also believed that this project can help a MT to overcome translating cultural contexts as well as rom-com movies contain so many cultural situations. To move forward, we scheduled a client meeting with our proposal to train SMT. If you would like to take a look at the proposal, please find the attached file right below.
After we had the meeting, we kicked off our project. I would love to take a moment to talk about deviations we encountered while doing this project. First of all, we thought the topic, which was rom-com movie, was interesting to try and easygoing. However, our expectations were just wrong. We completely didn’t realize the true nature of subtitles. When we were trying to align the original files (which were ‘English’ transcription of what actors are saying) with the target subtitle files (which were sort of ‘Korean’ summary lines), most of lines didn’t make sense and most of the lines were liberally translated. We found aligning those files absolutely troubling and impossible to train. However, we didn’t give up doing it and till the end, we tenaciously continued to carry out the project, aligning the files one by one. We made some results, improved BLEU scores, came to a certain conclusion. We created an updated project summary.
As our last step of this project, we had a presentation on lessons we learned. One major lesson we achieved was ‘DO NOT ATTEMPT TO TRAIN SUBTITLES‘. It is mainly because the characteristics of source files and target files are really different. As mentioned before, source subtitles are transcriptions of every words and lines while target subtitles are more like summaries of the transcriptions. Besides, subtitles are supposed to be pretty colloquial, which made translators translate liberally. Below is the ppt slides on lessons we learned.
Utility Demo
Other than training a machine translation engine, we did a fun activity as well. Professor Wooten gave us a bunch of software options to choose and then let us explore one of them. At the end of the class, he gave us an assignment to try a new software and make a demo video about an instruction of the software. I chose ‘Practicount & Invoice’ to make a demo video. The software basically lets people to count words for translation and create a quote based on the counted words. If you would like to watch the demo video, please click here.