Training a Neural Machine Translation Engine for Friends Series by Using Microsoft Customer Translator

Custom machine translation is the adaptation of a machine translation(MT) system to be specialized in a specific domain or topic. Custom machine translation, when performed correctly, will deliver notably higher quality translation output than generic machine translation. With the help of a successfully trained MT, the human translator and editor can largely improve their efficiency and cut down plenty of go-to-market time. In this blog, I will illustrate how to use Microsoft Customer Translator that employs Azure AI to train an MT for the subtitle translation of the famous TV series Friends.

So, Why Friends?

American TV series or movies have a large population of the audience all over the world. So the needs of subtitle translation are huge. Since the subtitles are colloquial, short and easy, our team believes a customed MT can largely contribute to improving translation efficiency and saving money. Thus, we launched a pilot project for the famous TV series Friends to test how much efficiency the translation machine can improve with acceptable quality.

MT Pilot Project

Pilot project: Train a machine translation (MT) engine to translate the subtitles of FRIENDS series, EN>zh-CN

Pilot project objectives: PEMT 30% faster(efficiency) & 30% savings; PEMT 10 penalty scores/1000 words, no critical errors as the quality threshold

Pilot proposal 1st edition:

Methodology

Set Contrast groups
Training 10 rounds
Deploying the highest Bleu scored model
2 rounds of human PEMT to test the quality of MT and the efficiency improvement rate
Post mortem and redesign the proposal to improve the machine training

Workflow

Data collecting(Training data: 9 seasons EN-ZH human translated subtitles; Tuning data: Season 10 subtitles) >> Data cleaning(develop bilingual XML files by using aligning and segment tool; develop term list)>>Data training>>Deploy the model got by the first round>>First round PEMT>>Data training>>Deploy the model of highest Bleu scores>>Second round PEMT>> Compare the results of two rounds PEMT>>update pilot project proposal

Outcomes

* All the “time used” data is an average time used by two editors editing 100 segments.

* The Baseline BLEU Score(indicate the performance of normal MT) is 17.81

The Customized MT improves the translation rate by 49.37% and 53.70% separately in the two rounds

Anticipated project outcomes for update pilot project proposal:

Efficiency: 50% faster than human translation
Cost: 40% savings over human translation
Quality: The number of remaining error penalty should be limited to 35 per 1,000 words

Updated Pilot Proposal:

Lessons Learned and Afterthoughts

Our BLEU score jumps from 15 to 30. In this process, the 2 vital factors that result in this change are the segment volume increase and clean data(the correctly aligned bilingual segments). Thus, the data processing should follow the instructions below:

Collecting as much data as possible to ensure there are enough data for training, testing and tuning. Clean them up before training can avoid garbage data negatively affecting the BLEU score. Furthermore, deleted all scene and names of speakers, and harmonized all names in the dialog. Another way to solve this problem is to build a glossary for all the names. The oral words variated in spelling which did impact the final score a lot.
Split the long sentences and paragraphs into short segments. It would be hard for the audience to read a long sentence in seconds; therefore, we also include length of segments in our QA check metrics.
Data alignment. Align data before training can lead to higher BLEU scores and perform more accurate machine translation afterward.

In addition, through our increasing BLEU score brings the huge increase of MT quality, the efficiency improvement (time saved compared with human) wasn’t obvious which made me thinking of the necessity of customizing an MT for subtitle translation. After all, subtitles are short, easy and colloquial. Maybe a normal MT engine like Google translate is enough for this type of translation. However, no certain answer will be concluded without further trials. Thus, I look forward to the next experiment by increasing the segment volume via adding the subtitles from other similar TV serials.