Introduction: This blog post is about a SMT training project using Microsoft Translator Hub. In this portfolio, you will read about the description of the project, lessons learned from the project, my ideas on how we can improve translation quality by leveraging the power of MT, and my experiment of other techniques to improve the overall localization workflow.

SMT Training Project: Developing a MT engine for translating Chinese Government Work Report

Project Overview: This pilot project is designed to establish a Statistical Machine Translation (SMT) engine from English to Chinese in order to support the Chinese government initiative to provide public documentation in both Chinese and English. Our data source will be extracted from Chinese government public report websites.

In order to be considered “fully trained”, the post-edited machine translations (PEMT) from this engine must meet the following target criteria for efficiency, cost savings and quality:

  • Efficiency: PEMT 20% faster than human translation
  • Cost: PEMT 25% savings over human translation
  • Quality: PEMT with an acceptable score of less than 30 based on the Multidimensional Quality Metrics (MQM). The acceptable score was increased from 10 points to 30 points due the length and complexity of the product review compare to the government reports.

A total of 13 rounds of successful training were completed with an initial data set of approximately 56,916 segments used as training data, approximately 1,270 segments used as tuning data, and approximately 1,559 segments used as testing data. A BLEU score will be given after each round of training by the Microsoft Translation Hub. The initial round of training achieved a BLEU score of 12.6 and over the next three weeks, 12 more rounds of training were done with a result of 18.61 (~48%) as the best BLEU score achieved. Finally, the MT system with the highest BLEU score was deployed and the translation of this system was compared with the human translation team to reach our final conclusion of this project.

Data: If you feed the machine with garbage, your machine will output garbage. To maximize the quality of the input data, the official bilingual files from Chinese government were used and each file was carefully segmented and aligned using CAT tools.

  1. Timeline

*GWR: Government Work Report; WP: White Paper

Date Task Duration
04/02 – 04/03 Project Planning 3 hours
04/04 Data Collection: 3 TMX (GWR 16-18) + 2 TMX (WP) 9 hours
04/05 Data Collection: 3 TMX (GWR 13-15) + 2 TMX (OPUS and WP) 6 hours
04/06 Realign Tuning Data: GWR 13/15/17 3 hours
04/08 Data Collection: 2 TMX (GWR 11/12) + 3 TMX (WP) 5.5 hours
04/10 Data Collection: 3 TMX 0.5 hour
04/11 Data Collection: 4 TMX 1 hours
04/12 Data Collection: 8 DOC (GWR Monolingual) 1 hour
04/13 Refine Training 1 hour
04/15 Deploy System 0.5 hour

 

2.Conclusion

A 500-word excerpt from Government Work Report was used to test the deployed engine. The MT output was post-edited by editors and errors from the output was identified. On the other side, the same file was human-translated. After comparing the time, quality and cost, we reached the following conclusion.

 

Based on 500 words Rate Saving Post Edit Time Time Save
Goal:

Price: 20%

Time: 25%

HTEP $0.25  

20%

60 min  

25%

PEMT+EP $0.20 30 min
 

Final Result

HTEP $0.25  

    0%

60 min  

0%

PEMT+EP $0.25 60 min+

(Comparison of errors found in HT and SMT using MQM as the criteria)

The outcome of our SMT did not meet any of our goals for this project. Nevertheless, the team has put tremendous effort into the allotted time and completed the entire process from initial planning to deploying the trained SMT.

Lessons Learned

The experience and knowledge gained from this pilot project resulted in the analysis to accurately project the requirements to fully train this SMT to support the government work report initiative.

Training Data Requirement:

Based on our pilot project result, we project at least 1,000,000 segments are required to fully train the SMT engine. The major quality issue resulted from our SMT engine are grammar, more specifically sentence structure problem. We conclude that increasing the quantity of training data and tuning data will significantly improve the grammar quality of the SMT engine.

Additionally, well-aligned source documents are key to the success of the SMT training. We only accepted the official government translations and conducted multiple alignment and reviews of our data. The result was less quantity but better quality, the result was evident in our 9th round training result. Our BLEU score increased from 13.4 to 18.6. We believe that adding a dictionary and monolingual data would have significantly improved our BLEU score and the overall quality of our SMT engine.

Time and Cost:

We project 783 hours and about $35,000 are required to fully train the SMT engine. A team of at least 4 full-time members would ensure the SMT can be fully trained to meet the goals. Our team significantly increased the rate of SMT training rounds after the fourth round, which has helped us quickly identify a deficiency and the direction we should focus on.

Tools:

Initially, our team only used the TMXMall as our primary tool for favoring its simplicity and accessibility. But, quickly we found the quality of our aligned documents have suffered from our tool or choice (see first four rounds). By implementing TRADOS Studio and Okapi Olifant, we were able to align all of our documents into sentence segments and drastically improved the quality.

In conclusion, we do not recommend using the SMT instead of the human translation for the government work report initiative. We believe, with the proposed plan outlined above, the fully trained SMT engine would meet the established goals in order to be utilized for future government work report initiative.

Meet the future: Customizing a Neural Machine Translation engine?

When identifying error patterns of MT outputs, we captured some causes of these errors. Word order is a typical error of the MT output. The MT system tends to translate word by word, which ignores the fact that following the same word sequence in the target language may actually doesn’t make any sense. This can be partly improved if we create a dictionary or term database that is linked to this system. But the NMT system may be better at delivering a more accurate translation as NMT better captures the context of full sentences before translating them.

By the time we finished this project, Microsoft has not opened its NMT system for customized training. But the good new is, THEY JUST DID THIS a week ago. Click here to learn more.

Personally, I’m quite interested in training an NMT engine and can’t wait to see the results. What I do believe is that no matter how unsatisfied we are with the output by existing MT, being able to utilize various techniques is a basic skill for any ambitious translator.

Technology is not always about some unreadable codes or algorithms, sometimes utilizing a simple tool like this will make your translation work much easier.

https://youtu.be/hau8DHpKoVY

There’s much more to be explored. Always being curious, eager to try, and open to latest technology is perhaps something most valuable I get from the CAT course at Middlebury Institute. Thank you, professor Adam Wooten.

(You can access the files of our SMT project here.)