Kimi Li's Localization Wonderland

Never stop creating.

Author: Wanxin Li (page 1 of 2)

Website Design and Localization – General Stilwell Scholarship Project

Project Introduction

For the whole fourth semester, our team Xiu finished a big website design and localization project for General Stilwell Scholarship (the website we built can be accessed here), a scholarship aimed at Chinese students at MIIS.

Why did we choose this client?

General Stilwell is a US army general who served in the China Burma India Theater during WWII. To many Chinese people, he is a legendary general that helped Chinese defend our territory during WWII. The Stilwell Foundation was founded in 1982 to provide scholarships to students from China for studies toward a master’s degree at the MIIS. Its mission is to promote Sino-American understanding, appreciation, and goodwill as it honors the legacy of a man with deep ties and great affection for the Chinese people.

The person who’s in charge of fundraising in China reached out to two of our team members who are two of the current recipients of the scholarship, expressing her will of building a Chinese website for the scholarship. After discussing all the possibilities, we agreed on building an English website first, and then localizing it into Simplified Chinese.

Our goal

Our goal of this project is to help the scholarship foundation reach more potential donors in China and increase our school’s visibility to attract more Chinese students.

Project scope

After meeting with our client, we finalized the things to do for the project, including shooting an English video featuring the current recipients’ life at MIIS so that prospective students and donors can get to know some of them and localize it with subtitles, importing all the texts and images provided by our client, designing the whole website, and localizing the website into Simplified Chinese with a decent language switcher.

 

Processes

Video production and localization

Our first step was to develop a list of questions for our scholarship recipients – These included questions like “What are you studying?”, “Why did you want to come to MIIS and what you like most about the school?”, “How the scholarship was helping you to achieve your goals?”. Next, we hired our volunteer internal engineer to film each recipient at various spots on campus with separate audio recording device for editing the footage later.

Shooting the interview video

After our engineer edited the video and sent us the .srt subtitle file, we then recruited a volunteer translator to translate subtitles in Google Translator Toolkit.

Translating in Google Translator Toolkit

After editing and proofreading, our engineer burned the subtitles into the video and did some final edits. And this is the localized video posted on YouTube.

Website design and localization

Website design and implementation

After shooting the video, we started to build our website from scratch. Based on the materials we collected from our client, we decided to have four pages on the website and drew the layout of each page on paper and it became our foundation of website design. Meanwhile, we communicated with our client, getting suggestions for the design and also getting support of the content.

And then, it was time to turn the design into reality. We created a WordPress website using our own account as a testing website because at that time, we hadn’t dealt with the domain and hosting issue and we wanted to have a stable and clean website before being visible to the public. We found a suitable theme for the website,  used Elementor, a localization-friendly page builder to help build our website.

Build the website with Elementor

We also inserted some amazing templates from the Internet to make our website more appealing. Below are some examples:

Picture slider on the home page

 

Profile pictures and bios of recipients

 

Donate page

After all the website design work, we created an email account using the website cPanel and created a YouTube Channel with it. Then, we uploaded the interview video we localized and display it on our web page.

Website migration and localization

After getting an official domain and hosting server for the scholarship website, we installed WordPress in the cPanel of the website and migrated all the content from our testing website to the official website using a plug-in called All-in-One WP Migration.

For localization, we installed WPML to create the language switcher, and recruited translators to translate all the content including pages, menu, website title and tagline. Due to time limit, we didn’t connect WPML with any CAT tools or translation management systems but asked the translators to translate the content directly in WordPress, which was very tricky and it’s something that shouldn’t be done in the real world. Later, we synced the English and the Chinese menus and everything was good to go.

 

Suggestions for future tasks

  • First of all, to improve the service package. There will probably be updated information for the website, so continuous design and localization is necessary. Also, even though we only built the website for the scholarship, we still think it’ll be a great idea to help them build the logo and the business card to complete the whole service package we provide. And since there are some previous scholars now in China responsible for fundraising and they constantly go to charity meetings, localization of the business card is also potentially needed.
  • Second, all the videos are now on YouTube which can’t be accessed in China. So the videos need to be uploaded on one of these Chinese media platforms.
  • Third, to improve SEO (search engine optimization) in major Chinese search engines such as Baidu, 360 and Sougou. Since the goal of the website we built is not only to attract Chinese donors for the scholarship, but also to bring more Chinese students to MIIS, we think SEO is also very important.

 

Lessons learned

From this project, I do have learned many things when comes to website localization. These may not be directly related to project management, but they are very useful when you provide localization consultancy to the clients:

  • Always build a testing website that is not visible to the public first and then migrate all the content to the real website. If something gets messed up on the real website during the localization process, it will cause a huge disaster.
  • Not all page builders are localization-friendly. Page builders make website design way easier and make everything look perfect and chic. However, WordPress or other website builders cannot recognize some page builders and therefore can’t pull out the strings input on those page builders for translation. And that’s why we chose to go with Elementor for this project after doing some research.
  • If you are going to display videos on the website from a media platform and the client asks you to upload the localized videos for them on social media, do not use your own account. It is unprofessional. Either ask the client to give you their login information or register a new account for them.
  • When localizing videos and websites, make sure to take into account whether the audience can get accessed to them. Do enough research before choosing the platforms. For example, YouTube and many social media platforms such as Facebook and Instagram are banned in mainland China.
  • Always look for creative solutions. When we designed our website, there were something wrong with the page builder and we couldn’t get the three-column texts working in the right format. So instead of putting them in one column which would take up too many space, we created a PDF file with all the texts in a right format and inserted a text link on the web page so that when the viewers click the link, they can view the texts with online PDF reader.

    Insert a text link at the bottom

Localization of a JavaScript game – “Swap”

This portfolio is based on a project of localizing a JavaScript game called Swap by my teammates Roxy Ma, Michelle Huang and me in the course Website Localization during the 2018 spring term at MIIS. The portfolio is divided into three parts: the brief introduction of the project, the basic workflow of the project, and thoughts from the angle of project management based on the project.

Brief Introduction of the project

Our project is to localize a JavaScript game called Swap with a method called 24 Ways from English into Simplified Chinese. Swap is an on-line tile-based puzzle game. It’s written in Javascript using the HTML 5 canvas for rendering. You control the ball and progress through various tricky and challenging levels to reach your goal. The death count piles up when you reach a new level. The following link can take you to a short 20-second demo of the original game: Demo of the original game. Our goal is to localize all the texts on the web page and add a dynamic language picker that looks natural to the interface.

 

Basic workflow of localizing the game

Our process of localizing the game includes preparation, translation, localizing the victory tiles and adding a language switcher.

Preparation

The first thing we did was to figure out what needs translation. We went through all the levels and identified the texts that needs translation. Below is the interface of the beginning of the game. All the strings circled down below such as “Deaths”, “Level”, “Reset” and “Skip” should be translated. The tips at the bottom like “Use arrows or WASD to move” in Level 1 varies as levels change and we need to make sure every tip is translated.

Interface of the web page at the beginning

Finally, if the users successfully go through all the levels and win, they will see “SWAP” at the end, which is something we need to localize in a quite unique way.

Victory tiles that need localization

For the strings in HTML, we duplicated the index.html file and renamed it to index_zh.html. For the strings in JavaScript, we adopted the method of 24 Ways internationalization. As seen in the code down below, we wrapped the strings “Deaths” and “Level” by adding an underscore and parentheses around them for externalization.

Wrapping the strings in JavaScript

After we wrapped up all the strings, we put them in a Strings.js file in the js folder of the game and it was ready for translation.

Strings.js file for translation

A challenge we encountered was we found that the tip strings are in an array. We tried to use 24 ways internationalization to wrap it, but it didn’t work.

Tips in JavaScript

So after brainstorming, we came up with a direct and simple solution: we copied the level.js file and then we decided to translate the tips directly in the Chinese level_zh.js file and linked it back to the index_zh.html file.

Translation

Now that all the strings were internationalized, we were ready to translate. For the HTML file, we simply threw the .html file into Memsource to translate. Then, we copied all the translations into the index_zh.html file like this:

Copying the translations into the .html file

For strings in JavaScript, we needed some extra work. First we copied the String.js into a word doc, and then we hid all the unnecessary. And then we saved it as .rtf file and then imported to Memsource to translate. After the translation was done, we put it back in the Strings_zh.js file, and linked it back to the index_zh.html file.

Localizing the victory tiles

At the end of this little game, when the player manages to win all rounds, there is a tile visualization of the English word “SWAP” shown. In order to properly localize this final screen, we need to figure out a way to transform this “SWAP” word into some Chinese characters.

Since the final credit “Swap, by Noah Moroze and MIchael Yang. Thanks for playing!” is shown together with the “SWAP” tiles, they must have been created at the same place, so we went ahead and searched for this line among all js files and found that it’s true. The swap tiles are mapped out using coordinates, 1 equals a grey tile, 0 equals a white tile and -1 place a blue inside the white tile.

Tiles in the JS file

Chinese characters are very complicated to be visualized by tiles, we could easily lose the aesthetic sense of the characters by drowning ourselves into the 1 and 0s, so we decided to come up with a simple workflow to visualize the character design first in powerpoint, then convert the design into coordinates and finally copy the coordinates into the js file.

To prove our assumption, we converted our characters into coordinate bundle. And the “Swap” tiles are successfully mapped out at the end of game.

Localized victory tiles

Adding the language picker

After all the translations, it’s time to add the language picker and link it to the pages using HTML and CSS.

Adding the code for language picker

First, we created the language selector in both the English and the Chinese HTML pages using the code. And it looks like this on the page. The link worked, but it just didn’t look pretty.

The language picker without CSS

So we inspected the existing codes inside the <body> tag and moved the language <span> tag to the right place and used CSS make the language switcher look natural to the webpage. So we added some styles to the language picker. We set the font family to match the font of all the other texts. We found the color of the grey tiles that appear on the start page in a .js file and set the background color to the same color. To make the picker look like a button, we adjusted the border radius. Then, we removed the underline of the text in the language picker and changed the font color to white. Our language picker now looks like this and everything was well translated.

We eventually successfully localized the game into three languages. The final version can be seen here: Final version of the localized game.

 

Thoughts from the perspective of project management

Even though all of our team members play the roles of localization engineers and translators during the project and no project management was involved, project management in this case may involve complicated steps and is never easy. Project managers who is going to manage similar JavaScript games should take the following factors into account:

  • When a game is being localized, text truncation will be a major issue if the text boxes are not expandable. Engineers may want the translators to make some tweaks and have a shorter version of the translations, which requires project managers to communicate with the translators about the need. This problem didn’t happen in our project since each piece of the dialogues was not too long. But one of our challenges was that when throwing HTML files to Memsource, translators had to manually inserted the tags into the right place. In real world, this can cause so much trouble since a missing tag may mess up the file. Project managers are supposed to make sure either engineers filter the tags out before handing over to translators or MQA check the tags carefully.

Translating HTML file in Memsource

  • Therefore, localization testing is a major step that project management should be involved in as much as possible. Many potential issues will show up in testing and project managers need to in which step things goes wrong and return to people responsible for that step for modification.
  • Last but not least, for the 24 Ways internationalization method, because the translator was working on the .rtf file and it was the engineer that was responsible for copying every translated string into JavaScript, there was a huge possibility of pasting a wrong for an engineer who doesn’t know the language. So in this project, a strict LQA process is highly required and the project manager should closely monitor the process.

Localization of a Fungus game—The Hunter

This portfolio is based on a project of localizing a game called The Hunter completed by my teammates Roxy Ma, Michelle Huang and me in the course Software & Games Localization during the 2017 fall term. The portfolio is divided into three parts: the brief introduction of the project, the basic workflow of the project, and thoughts from the angle of project management based on the project. The other two team members focus on the aspects of flowchart localization and voiceover localization of the game respectively, and their blog posts can be found through these links:

Brief introduction of the project

Our project is to localize a Fungus game called The Hunter in Unity from English into Chinese, Spanish and Russian. Fungus is an interactive story-telling extension for Unity3D. People all over the world have used it to create visual novels role-playing games. Fungus has lots of features for creating character dialogues and localization and interacting with animating scene objects. For this particular game, there are different options for a user to select that trigger different dialogues in different storylines. The following link can take you to the short demo of the original game: Demo of the original game .Our goal is to localize all the options, dialogues and character names into three languages and add a language menu at the beginning of the game.

Basic workflow of localizing the game

Our process of localizing the game includes preparation, translation and adding the language picker.

Preparation

During the preparation stage, we first imported the I2 package into Unity and created the localization function through the Tools tab -> Fungus -> Create -> Localization. At the point, the internationalization has been implemented pretty well since when we exported the localization file through the button on the right, the exported file was a CSV file containing all the texts that needed to be translated. The file is already ready for translation.

“Export Localization File” button

The original CSV file

Translation

During the translation stage, we imported the CSV file into Memsource and translated the texts there. Then we exported the completed files and copied the translations into the CSV file like this:

CSV file with translations

Adding language picker

This was the most important and challenging stage for our team. Since the entire game is configured based on a flowchart logic, we created a logic of language setting menu based on the flowchart.

The original flowchart

What we did was we created four language blocks in the chart and within each language block, we used the set language command of Fungus to activate languages by setting the language code which we have previously defined in the CSV file so that the game would use the particular translation that is consistent with the language code written here.

Language blocks

Set language

Set call

In spite of all the challenges we have including failure to find scripts, messed-up texts for translation and logic errors in flow chart design, we eventually successfully localized the game into three languages. The final version can be seen here: Final version of the localized game.

 

Thoughts from the perspective of project management

Even though all of our team members play the roles of localization engineers and translators during the project and no project management was involved, project management in this case may involve complicated steps and is never easy. There are mainly three reasons for it:

  • In this project, engineers have to export the localization CSV file and import it into CAT tools for translation. However, compared to the original CSV file above, the file in CAT tools looks like that the steps in the flowchart and characters are extracted and the item row including “Key”, “Description” and “Standard” are also listed here for translation, which means engineers have to figure out how to hide those items before importing into CAT tools. Project managers at this point should check the prepped file prepared by engineers to see whether it is clean for translation. If this step is missed, translations may translate those unnecessary items and the whole localization file will be damaged.

CSV file in Memsource

  • When a game is being localized, text truncation will be a major issue if the text boxes are not expandable. Engineers may want the translators to make some tweaks and have a shorter version of the translations, which requires project managers to communicate with the translators about the need. This problem didn’t happen in our project since each piece of the dialogues was not too long. But one of our challenges was that some tags couldn’t be filtered in the Memsource, the CAT tool we used during the project. To make things simpler, we just copied and pasted the tags when doing translation. But in real world, this can cause so much trouble since even a parenthesis may mess up the file. Project managers are supposed to make sure either engineers filter the tags out before handing over to translators or MQA check the tags carefully.

Tags

  • Last but not least, localization testing is a major step that project management should be involved in as much as possible. Many potential issues will show up in testing and project managers need to in which step things goes wrong and return to people responsible for that step for modification.

TMS Final Mini-Portfolio

Introduction

This portfolio is a compilation of a translation project done within a translation management system called SDL WorldServer by my team TMS Spice Girls in the course Translation Management Systems during the 2017 fall term, and three blog posts titled “Quality Models & QA”, “Translation Management Systems for Crowdsourcing” and “Tips for selecting the best TMS”. All of these support my comprehension of how to implement a translation project in a TMS, what features and functions a TMS should have to meet the needs of completing a translation project and my evaluation on the selection and adoption of a suitable TMS for a company.

Translation Project in SDL WorldServer

The translation project our team has completed in SDL WorldServer through this semester was to translate two content pages of marketing materials in the format of XML files from English to Simplified Chinese for our client King & Wood Mallesons Law Firm and the process included pre-translation setup, translation and QA.  The final project files that our team created when doing the project include a proposal, the source and target texts, pseudo translations and the presentation slides on lessons learned, and the project folder can be accessed via the hyperlink above. The project was done based on what we’ve learned over the semester, including how a TMS like SDL WorldServer operates in different aspects related to, for instance, finance, user accounts, QA models and translation. Moreover, the presentation slides also contain things our team think WorldServer should improve based on the challenges and problems we met during the project. This project gave me a clearer and more detailed picture of how to manage a translation project inside a TMS and what the workflow would be like.

Blog Posts

Each of the three blog posts explores a different aspect of translation management systems and refers to either a video conference or articles posted by experts in the translation and localization industry.

  • “Quality Models & QA”

Based on a video introducing MQM (Multidimensional Quality Metrics), the article explores the benefits of MQM, the reasons why it’s needed and how it might influence the use of TMSs.

  • “Translation Management Systems for Crowdsourcing”

With the reference of three articles, the blog post illustrates the definition of translation crowdsourcing, the differences between crowdsourced translation and traditional translation, the features a TMS might have to implement crowdsourcing and how crowdsourced translation should be managed.

  • “Tips for selecting the best TMS”

Based on four online articles, this blog post explains the things a company should take into consideration when selecting the best TMS from the points of view of both the client side and the vendor side. In addition, it elaborates the most important factor that I think a company should focus on when choosing TMSs, which is, the translators.

Tips for selecting the best TMS

Source:

  1. “Eight Steps to a Successful TMS Roll-Out” by Andrew Lawless (Rockant/DigIT) http://rockant.com/wp-content/uploads/ubpfattach/eight-steps-to-a-successfull-tms-roll-out.pdf
  2. “Implementing a Translation Management System: 4 Things to Keep in Mind” by Camille Poudens (Freedman International) https://www.freedmaninternational.com/blog/blog/category/implementing-a-translation-management-tool
  3. “Choosing a TMS: Getting Started” by Yee Lam Cook and Viviana Bertinetto (Global Language Solutions) https://www.gala-global.org/publications/choosing-tms-getting-started
  4. “Shopping for a Translation Management System: How to Choose the Best for Your Company” (Sajan) http://www.sajan.com/shopping-for-a-translation-management-system-how-to-choose-the-best-for-your-company/

 

Selecting a suitable and effective translation management system is an important thing for both client companies and LSPs. For a company from the client side that wishes to find the best translation tools, it has several steps to go through before making the decision. Firstly, according to Brian McConnell in his article “Choosing The Best Translation Technology for Your Company”, the most valuable thing to do is to do some research and get itself familiar with the translation and localization industry to understand how this industry operates and what categories of products and services are. Second, evaluate the tools based on its own technology and process rather than on the needs of its translation agency. Third, involve bilingual staff in the localization and translation efforts since they are able to assist with translation and market expansion projects. And finally after filtering out some of the options, all it has to do is to test out the candidates and evaluate them based on the staff’s opinions in terms of their usability, workload reduction and output quality. Resources can now be shift into the solution that wins.

For the vendor side, there are much more factors to be taken into account during the selection of the best TMS.

The first thing to do before the search is to understand the current and future translation needs in the company. This includes evaluating the actual necessity of owning a TMS tool based on the projects to be managed, the project management needed (regarding the way the projects to be handled and the desired level of automation), finance tracking system and the requirement of integration and interoperability between systems. After having a big picture of the company’s blueprint, the next thing to do is to involve all the human resources in the process of selecting a TMS. All the different internal departments that will be affected by the decision should be taken into consideration since they will be the ones to implement the product. Moreover, vendors should also be involved as early as possible and they have the right to be notified of the change in advance. Research on the vendors’ major concerns and their opinions is significantly necessary to build constructive suggestions for choosing the right tool. Before implementing the tool, the company is supposed to think about the allocation of resources for setup, training, customization login creations and so on. Last but no least, test, test, and test again. But testing can be tricky. The core is to identify a team that works on the same project type, ask them to launch a small internal project to test the tool out and gain feedbacks.

For me, I couldn’t agree more that it is so necessary to consider the translators during the selection of a new tool. It aroused my consensus that “Some tools are far better for managers than for translators. The different editor interfaces and features might be great for PMs, but not for the people actually working with the copy”, especially for an external vendor. A good LSP is a one that also think about the vendors, asking them whether they are willing to learn new tools and delving into the way the tool works to make sure their life won’t be made hard because of it. Since vendors consists of one of the crucial part of the translation and localization process, they are key to the company’s success and whether they are happy with the chosen TMS matters. And that’s why when we started to look for volunteer vendors for our Localization Practicum projects, the first thing we asked on our survey form is: Are you interested in learning a new tool for your translation?

Translation Management Systems for Crowdsourcing

Source:

  1. Facebook Taps Users to Create Translated Versions of Site” by Michael Arrington, TechCrunch, January 21, 2008
  2. Can Companies Obtain Free Professional Services through Crowdsourcing? ” by Adam Wooten, DeseretNews.com, February 18, 2011
  3. People-powered Translation at Machine Speed” by Jessica Roland, MultiLingual, January 2014

 

With the development of technologies and the increased flow of information, crowdsourcing is gaining more popularity and has been adopted by many companies allowing them to improve their production capacity while lowering the costs at the same time. Crowdsourcing is also happening in the translation industry as well.

Translation crowdsourcing is the practice of obtaining translation services from a large group of people, usually multilingual users, especially from an online community. For example, Facebook has invited its multilingual users to translate phrases from the site pages that need to be translated. For me, there’s only a tiny difference between crowdsourced translation and volunteer translation, and that is, crowdsourced translation includes both paid and unpaid translation while volunteer translation is absolutely unpaid, even though the word “unpaid” is actually not accurate in the strict sense because it still costs money to manage the platform of crowdsourcing and the quality of translation.

Because of the difference between the forms of crowdsourced translation and traditional translation, these two practices have to be managed differently. First of all, translators in the traditional translation environment often use complicated CAT tools which can be expensive and hard to learn, but in crowdsourced translation environment, users should understand what to do and how to do it right way to ensure speed. In addition, traditional translation commonly has individual rates for each language while crowdsourced translation might have a single rate across languages per target quality level.

According to Jessica Roland in her article “People-powered translation at machine speed”, a TMS might have several certain features to facilitate crowdsourcing, including:

  • Streamlined and intuitive systems for ordering and translating
  • Reduced operations for translators and clients

And the technology should meet certain special needs, such as:

  • Having an automated quality checking process
  • Having a strong and flexible system so that it can handle a large crowd of translators working at the same time and the crowd can grow and shrink to accommodate various order volumes in different sizes
  • Having a highly automated translator acquisition system including online feeder channels and automated online testing

After reading the three articles, I think paid translation crowdsourcing can also be counted as crowdsourcing since it still reaches out to translators and will create the products with everyone’s efforts. To manage this kind of crowdsourcing, a TMS should also have systems that deal with payment. So it should support multiple currencies and regular translator payments via international payment methods such as Paypal and should have simple, transparent pricing and efficient payment systems for quotes and invoices.

I think one of the best things for translation crowdsourcing is that the crowd will often self-correct and manage QA on its own. Users tend to let us know which translations or translators they disapprove of. Facebook utilized this idea quite well and they let other users vote on the submissions by other translators or submit their own version rather than let their own internal QA people to do the job. Therefore, a good TMS to manage crowdsourced translation would also provide a platform for the crowd to evaluate each other’s work.

Quality Models & QA

Source:

Video “2013-07 QTLaunchPad: MQM Version 2 and Software Infrastructure” by Arle Lommel: https://vimeo.com/71836764

 

The video illustrates the existing problems of human metrics and machine translation metrics respectively. There’s no consistency for human metrics that cover over 180 different issues to be checked. Because of the “One-size-fits all approaches”, human metrics are not flexible enough to handle the needs of different projects since, for example, perfect grammar may not matter that much in gist translation, and legal translation requires both accuracy and fluency. Also, human metrics are totally disconnected with MT evaluation which doesn’t help improving the MT quality.

For machine translation metrics, metrics like BLEU score don’t indicate what kinds of problems the translation result has, such as inconsistency of terminology and mistranslation, but only how much the result deviates from the reference materials. They also confuse different things including product quality (fluency, accuracy and verity), process and project.

MQM has multiple benefits. First of all, different hierarchical levels of issue types are created for different tasks. It is currently divided into four branches, namely, accuracy, fluency, verity, and design and follows a system of a core and extensions in which the core supports common issues while the extensions support additional needs and are suitable for some specific purposes. In addition, users can select task-specific metrics to check only what’s needed. So even though this is a quite complex model, people only have to use those parts they need.

For me, I think MQM will influence the design of the QA section in a TMS in some way. For instance, based on the multiple ways to use MQM illustrated by the lecturer, the QA function in a TMS can adopt the predefined MQM model but should also allow the users to add cores and extensions based on different scenarios. A great thing to do with the metrics would be specifying aspects to check in certain subject fields. A TMS can connect with the translation principles of certain industries and be updated anytime so that when the users select the subject field for a certain project, the common issues to check for that industry will automatically pop up, but of course it should allow the users to make changes. In this way, people will have a big picture of how others evaluate the translation for a certain industry and personalize the metrics based on individual projects.

That being so, I’m wondering whether those who created the MQM have a platform for people to share the experience of what different aspects they select to check for different projects and whether they have a way to collect all the information because I think those experience can really be a guidance for startup vendors and even for translation management systems to create their default or suggested QA sub-models based on the big MQM model.

Workflow of the DTP final project: Localizing a movie trailer

Introduction

For the DTP final project during the 2017 spring term, I localized a one-minute trailer of a Disney movie called Moana into Chinese. I did dubbing and recreated the audio with Chinese lines, similar sound effects to the original audio file and the Chinese version of the background song using Adobe Audition CC 2017. Finally, I added subtitles for the on-screen texts in the video and integrated the video and the audio file I created using Adobe Premiere CC 2017.

The original video can be watched here:                        https://www.youtube.com/watch?v=mxhI4sh85wc

The video that I have localized into Chinese can be watched here: https://www.youtube.com/watch?v=EzSLQLRaqEw

Process of the project

The process of this project consists of five steps, including preparation, dubbing, audio editing, integration and subtitling for on-screen texts. Details of each step are as follow:

Preparation

  1. First of all, I wrote a spreadsheet as shown in the screenshot below, listing the background music, all the sound effects, English lines and their Chinese translations in the order of time sequence according to the video. Because recreation of the audio means I have to make sure all the sounds could perfectly match the video, I recorded their time duration in the third column, and “V” means their starting and ending time in the video. This made the audio editing process much easier and faster.
  2. Then, I searched all the needed sound effects and the Chinese version of the background song on YouTube. I at first tried to find the resources on free online sound effects libraries such as org, but later I realized that there were so many special sound effects in the video like the sound of a lightsaber, the sound when the spotlight is on, and the sound of monster roaring which I wasn’t able to find, so I ended up searching them on YouTube.
  3. After that, I converted all the sound effect YouTube videos into .mp3 with the help of a website named OnlineVideoConverter. I simply just pasted the YouTube video links to this website, converted them into mp3 audio files with the quality of 320 kbps and downloaded them. I listened to the files again and again and recorded which parts in the sound effect audios were needed, which is shown in the “S” in the “Time Duration” in the screenshot above.

Dubbing

Because there’s a male character in the video, I asked a classmate to dub for the character. We then recorded our voices at the school DLC recording booth using the tool Camtasia. We recorded one Chinese line after another while watching the original video to make sure we speak at the same speed as the original voices. Everything including the lines and our discussions was recorded into one audio file, but there was a few seconds’ pause before we said our lines, which made the audio editing less painful.

Audio Editing

After all the audio resources had been found or recorded, I created a new project in Adobe Audition CC 2017 and imported all the audio files. I used the razor selected clips tool to cut the audio clips into the length I wanted and placed them on multiple tracks if some of the sounds had to be overlaid. With the reference of the spreadsheet written in the preparation step, it was much easier to place the audio clips accurately. This step was not tough but it really was time consuming and required me to listen to the tracks while watching the video over and over again. Finally, I adjusted the volume of some audio clips because some of the sound effects were too loud and I hoped to lower the volume of the background music when there was a human voice. I mainly used two approaches as illustrated in the screenshot below:

When I wanted to adjust the volume of the whole track, I simply just changed the dB number under the track. When I only wanted to adjust the volume of a single clip when there was more than one clip on the same track, I set up key frames in that audio clip, which is very similar to the way we’ve learned in adjusting the opacity in Adobe After Effects. Then I raised or lowered the yellow bar to adjust the volume. This is the final look of all the audio clips placed correctly in Audition:

Integration

After the audio editing work had been done and everything sounded good enough, I exported the entire session of the multitrack into one single audio mp3 file. Then I imported both the video file and the audio file I created into Adobe Premiere CC 2017 and placed them on different tracks. As you can see here, the second track is the original audio file which comes with the video. I muted it because it couldn’t be deleted, but this didn’t affect what the final product sounded like.

Subtitling for on-screen texts

After the integration, I added still subtitles for the on-screen texts also in Adobe Premiere. First I moved the pointer to where there was an on-screen text, and under the “Title” tab, I chose “New Title-Default Still”. Then a window popped up. I created a new text box, typed in the Chinese translations, chose a perfect gradient font, adjusted the size of the font, placed the text box under the original texts and used the eyedropper to change the color of the texts.

After all the subtitles were created, they would appear on the right of the interface.

I dragged each of them to another track. Like what I did in the audio editing step, I adjusted the length of each subtitle clip to make sure they appear and disappear simultaneously with the original on-screen texts. Last but not least, I exported the media.

Challenges

There are two challenges during the process of this project:

  1. It was very hard to find the resources of sound effects because I didn’t know how to describe the sounds, such as the sound when the male character transformed from an eagle to his human form, and the sound when the male character’s weapon glowed little by little. I ended up searching on YouTube with every word I could use to describe the sounds and luckily found all of the similar ones in the end.
  2. Editing the audio clips was so time consuming. Since most of the sound effects appeared for only a very short time in the video like the sound of sword unsheathing, the head hitting the wood, and whoosh, it was hard to edit the clips and placed them correctly to make sure the sound effects exactly match the video.

Lessons Learned

I’ve been able to keep passion to make my product perfect throughout the whole project because first of all, I love doing jobs related to sounds, to music, and secondly, I love the movie I chose. For a self-selected topic, I strongly suggest choosing something you really enjoy.

There are two things I’ve learned from this project:

  1. Writing a spreadsheet and pinning the time of every sound appears in the video really helps. It can make the step of audio editing faster and much easier.
  2. If possible, it’s better to record the lines and save them respectively into separate audio files. This helps enhancing the efficiency of audio editing because the sounds that are not needed in the video don’t have to be reviewed and edited.

Conclusion

In a word, I’ve really learned a lot during this project, especially from the classmates who also chose to do dubbing. I’ve gained new skills of using Adobe Audition and Adobe Premiere, and most importantly, I’ve now had a clearer picture of the process DTP localization.

Comparisons Between KantanMT And Microsoft Translator Hub

The topic of my group’s final MT project is about IMF world economic outlooks. To figure out which engine, KantanMT or Microsoft Translator Hub, is better for training machine translation on a topic like this, I’ve made comparisons between the two from different aspects and the results are as follows:

Comparison of the advantages and disadvantages of KantanMT and Microsoft Translator Hub

Comparison of BLEU scores

In order to make an objective comparison between KantanMT and Microsoft Translator Hub, I fed the same training, tuning and testing data respectively to the two engines. There is a huge gap between the BLEU scores of the two: the score Microsoft Translator Hub gains for the machine translation is 44.03, while KantanMT only 21. This means that the machine translation generated in Translator Hub is closer to the testing reference than KantanMT, while there’s no guarantee that the quality of the MT in the former is definitely higher than the latter. According to the instructions on the KantanMT website, a BLEU score lower than 39 stands for no fluency. Absolutely ungrammatical and for the most part doesn’t make any sense. Translation has to be re-written from scratch.

Comparison of human evaluated machine translations

To compare the quality of machine translations, I conducted a human evaluation based on the first 30 segments from the testing data according to the six error types our group has agreed on (including omission, mistranslation, untranslated, inconsistent with termbase, inconsistent with terminology and grammar).

There is actually also a huge gap between the results. For the MT generated in Microsoft Translator Hub, there are altogether 6 minor and 5 major grammar errors, 4 critical and 1 minor mistranslation errors and 1 critical omission error. For the MT generated in KantanMT, however, there are 6 critical, 2 major and 1 minor grammar errors, 19 critical untranslated errors, 1 critical omission error, 1 minor, 1 major and 3 critical mistranslation errors. The comparison is pretty obvious here, and the machine translation from KantanMT is indeed unreadable and influent as indicated by its instruction.

Comparison of features and benefits

However, even though the result of the machine translation generated from KantanMT is not ideal, the engine actually has many advantageous features that Microsoft Translator Hub doesn’t include:

  • There is more visibility of the backstage in KantanMT, letting the users know at what stage the machine training process is at certain point. This makes statistics-based machine training not mysterious.

  • It is possible to delete certain files that have already been uploaded. In some cases, for example, when adding certain files makes the BLEU score of the system go down, we may want to duplicate a new system and delete the files that cause the lower score. In Microsoft Translator Hub, however, there’s no way to delete the files and we could only uncheck the files instead, which is much more inconvenient.

  • There are many other indexes to evaluate the quality of machine translation than just BLEU scores, such as F-Measure scores, TER scores, and Gap analysis. This can provide multiple dimensions for the users to evaluate machine translation.

  • Simply for the BLEU scores, KantanMT also has really deep analyses. The BLEU scores are not only applied to the whole translation as Microsoft Translator Hub, they are also used for every one of the segments. This provides a much clearer and more detailed view of the quality of each segment.

  • In KantanMT, the users can also set up their own KPI indexes designed for specific projects, which is very customized and personalized. The reviewers can then evaluate the machine translation online based on those KPI indexes. Everything is organized and convenient here.

But KantanMT also has some disadvantages that Microsoft Translator Hub can complementary to:

  • There’s no “Tuning” and “Testing” tabs on the system training page. All the data files, including tuning, testing and training files all have to go under the “Training” tab, which may cause confusion.
  • TMX, PDF and other document types are not accepted as tuning or testing data in KantanMT, except Excel files and aligned UTF8 encoded text files, which is not convenient for aligning segments and storing translation memories.

In our group’s case, because all the files have already been aligned into TMX  files, we use Olifant to convert the TMX files into Excel files. The converted file looks awful, and many of the sentences seem to disappear or be replaced by unreadable codes. For example, the original English sentence should look like this:

But it looks like this instead in Olifant:

  • As the previous point shows, the naming of the testing and tuning data should be exactly the same as the instruction, i.e., “test.reference.set.xlsx”. Our group tried several times and then we figure out that if the file names don’t follow the naming rules, the engine would not recognize the files as testing or tuning data and would use its automatic tuning and testing data instead. This is extremely inconvenient for group projects since in this way, extra integration work has to be done after all the team members finish their own.

Recommendation for the engine suitable for training

By balancing all of the pros and cons of KantanMT and Microsoft Translator Hub, I think Microsoft Translator Hub is better for training machine translation on the topic of IMF economic reports.

There are two reasons:

  1. The quality of the MT in Microsoft Translator Hub is apparently much higher than KantanMT according to the human evaluation and the BLEU scores.
  2. Since there are a large amount of tuning and testing files we have to upload, it’s extremely inconvenient to integrate all of them into one Excel file.

But there are still steps needed to make sure Microsoft Translator Hub is better for our machine training than KantanMT. As mentioned above, the tuning and testing data we fed to KantanMT are the Excel files converted from TMX files which are quite messy. This maybe the main reason why the BLEU scores and the quality of the machine translation in KantanMT are so low. So we still have to clean up the Excel files, make sure the materials are identical to the ones we imported to the Microsoft Translator Hub system. Then the system has to be trained again with the clean Excel files in KantanMT and the results have to be reevaluated. If the BLEU scores and the quality are still lower than those in Microsoft Translator Hub, then we can confidently say that Translator Hub is a better engine to use for continuing our project.

Advanced CAT Final Mini-Portfolio Introduction

This portfolio is a compilation of the Machine Translation final project files my team Machina completed in the course Advanced: Computer-Assisted Translation(CAT) during the 2017 spring term, and a blog post titled “Comparison Between KantanMT and Microsoft Translator Hub” based on our final project, all of which support my comprehension of machine translation training and my evaluation on the suitabilities of two different MT engines for training projects related to the IMF world economic outlooks.

With the final project files (which are basically files our team created when doing a machine translation project with Microsoft Translator Hub) including a pilot proposal, the presentation slides on lessons learned and an updated proposal), my portfolio chiefly explores the comparisons between two machine translation engine–Microsoft Translator Hub and Kantan MT and recommendations for using which machine translation when training engines related to the topics that our final project is on.

Older posts