Machines – Translation – Metrics – Quality – Art

Until fairly recently, evaluating translation has been like evaluating art: most people agree that some art is better than other art, but there are few hard-and-fast criteria for objectively evaluating quality. This is also true for machine translation- how can we consistently and objectively evaluate machine translation output across various levels of quality and style? It’s not enough to say that one is simply “better” or “worse”, because that doesn’t give us any useful information about how to improve quality.

Here are some examples from art. On the left is a mural of two hands on the side of a building in Berlin. On the right is a set of two photographs of a hand hanging in the Münchner Stadtmuseum. Which one of these two images is better art? Why? You likely have your preference, but what are the determining factors? Is black and white better than color? Is photography better than paint? Is a small print better than a large mural?

Berlin          Münchner Stadtmuseum

Read more