Source:
Video “2013-07 QTLaunchPad: MQM Version 2 and Software Infrastructure” by Arle Lommel: https://vimeo.com/71836764
The video illustrates the existing problems of human metrics and machine translation metrics respectively. There’s no consistency for human metrics that cover over 180 different issues to be checked. Because of the “One-size-fits all approaches”, human metrics are not flexible enough to handle the needs of different projects since, for example, perfect grammar may not matter that much in gist translation, and legal translation requires both accuracy and fluency. Also, human metrics are totally disconnected with MT evaluation which doesn’t help improving the MT quality.
For machine translation metrics, metrics like BLEU score don’t indicate what kinds of problems the translation result has, such as inconsistency of terminology and mistranslation, but only how much the result deviates from the reference materials. They also confuse different things including product quality (fluency, accuracy and verity), process and project.
MQM has multiple benefits. First of all, different hierarchical levels of issue types are created for different tasks. It is currently divided into four branches, namely, accuracy, fluency, verity, and design and follows a system of a core and extensions in which the core supports common issues while the extensions support additional needs and are suitable for some specific purposes. In addition, users can select task-specific metrics to check only what’s needed. So even though this is a quite complex model, people only have to use those parts they need.
For me, I think MQM will influence the design of the QA section in a TMS in some way. For instance, based on the multiple ways to use MQM illustrated by the lecturer, the QA function in a TMS can adopt the predefined MQM model but should also allow the users to add cores and extensions based on different scenarios. A great thing to do with the metrics would be specifying aspects to check in certain subject fields. A TMS can connect with the translation principles of certain industries and be updated anytime so that when the users select the subject field for a certain project, the common issues to check for that industry will automatically pop up, but of course it should allow the users to make changes. In this way, people will have a big picture of how others evaluate the translation for a certain industry and personalize the metrics based on individual projects.
That being so, I’m wondering whether those who created the MQM have a platform for people to share the experience of what different aspects they select to check for different projects and whether they have a way to collect all the information because I think those experience can really be a guidance for startup vendors and even for translation management systems to create their default or suggested QA sub-models based on the big MQM model.
Recent Comments