Translation Crowdsourcing for Fog of World

A little bit background

For most companies with budgetary constraints, being able to lower the cost of translation projects is attempting. There are a number of ways for achieving that end. Perhaps, the most widely known with many successful cases is crowdsoucing. At the Middlebury Institute of International Studies at Monterey (MIIS), we examined the factors of successful translation crowdsoucing projects and came up with a tailored plan for selected organizations or companies in the form of a proposal. This blog post is the demonstration of what I learned from great translation crowdsourcing cases and how I applied it to design such a project for a selected company.

Crowdsourcing in general

After examining successful translation crowdsourcing cases, such as Facebook, Adobe, Translators without Borders, I have identified some crucial characteristics. This blog post presents some of the best practices a translation crowdsourcing project should follow from the quality and quantity aspect.

The proposal

After learning about the best practices, my group and I chose one of our favorite apps and designed a translation crowdsourcing initiative for them as if we had been tasked to do so.

The app we chose is called Fog of World, an app that allows users to “clear out fog on the map" when they walk around in the real world. This presentation includes the app’s development team overview, why crowdsourcing suits them, and the steps we have designed for them as if we were presenting this proposal to the team’s representatives.

Here is the proposal itself and the supporting documents in which we further explain how quantity and quality should be taken care of.

Best Practices of Translation Crowdsourcing

The idea of crowdsourcing has been around for a long time. If you do a Google search, you will find that crowdsourcing generally means the practice of effectively harnessing the crowds’ input to accomplish a certain goal. Nowadays, this is usually done unpaid over the internet because it is so much easier to reach such a big crowd on the internet, and paying such a huge crowd just does not make any business sense. Conversely, traditional translation projects have been done in a paid format by engaging only a small group of professional translators. When presented with these two options, it is only natural that people want to carry out translation projects in a crowdsourcing manner, hence the term: translation crowdsourcing.

The idea is feasible. Many international companies, such as Adobe and Facebook, have put it in practice and obtained significantly positive results, and their success is not accidental. In order to replace the traditional translation project workflow with translation crowdsourcing, we need to identify what is good about the traditional way. There are many advantages about the traditional translation project workflow. This article will discuss the two main points, assured quantity and quality, and identify best practices for a translation crowdsourcing project to achieve the same, or even higher, level of quantity and quality. Besides framing the discussion about best practices around quality and quantity, this article also only looks at translation projects that otherwise would not have been done if the companies did not opt for crowdsourcing.

Increase quantity by marketing and motivating

Paid professional translators are subject to contract terms, so a certain throughput can be expected from them over a period of time. However, this does not apply to unpaid crowds. There are two main ways that one can go about making sure of the quantity of work produced by unpaid crowds.

The first way is through “marketing.” The crowd cannot start translating for you if they do not even know about it. Marketing your translation crowdsourcing initiative to the right crowd as if it were a product is the first and foremost step to get it off the ground. A successful marketing strategy encompasses a solid marketing mix: price, product, promotion, and place. Identify what the crowd would have spent their time on (price), what they can get out of offering their labor for free (product), how and to which specific crowd you will promote your initiative (promotion), and where you would like your translation crowdsourcing to take place (place). Having these clearly defined can give you the biggest crowd.

Let’s take a look at an example. If a non-profit of preventing starvation has extreme budget limits and would like to opt for crowdsourcing to translate their short flyers from zh-TW into Japanese, the marketing mix would look something like this:

  • Price: If one short flyer roughly needs 1 hour to translate, the price would be a hour of the translator’s salary (if they translate during their work hours) or one hour of their leisure time, such as watching YouTube.
  • Product: A sense of achievement that the translator has contributed to starvation prevention.
  • Promotion: Post recruiting announcements on human rights forums since this group of people are more likely to join this translation crowdsourcing initiative.
  • Place: An online platform that allows people from everywhere to join.

After listing the marketing mix out, you can then carry out your marketing plan.

After you have them on-board, keeping them motivated is key to continuous high quantity. It is important to note that the crowd knew it would be unpaid, so using monetary rewards to motivate them at this point would send out a confusing message about your initiative and defeat the purpose of setting up an unpaid crowdsourcing project in the first place. People usually volunteer for a sense of giving back to the community or accomplishment. Therefore, any form of recognition is more appropriate in such circumstances.

Another good way to motivate your crowd is to show them the impact they have made. This will give them an incomparable sense of accomplishment and is more effective than any monetary rewards.

As mentioned in the beginning, a successful translation crowdsourcing needs more than a motivated crowd. How to maintain the quality of the work produced by the crowd is also crucial.

Maintain quality

Although the volunteerism nature of a translation crowdsourcing makes it harder to conduct quality control as opposed to the traditional translation projects, there are still measures that can be put in place before, during, and after to maintain the translation quality.

Before

When it comes to quality control measures at this stage, it might be tempting to vet the crowd the same way as a traditional translation project. For an unpaid translation crowdsourcing initiative, however, this might not be the answer since it could discourage people from even taking part in this crowdsourcing.

Instead of requiring them to have certain knowledge, a more appropriate approach is to provide them the knowledge as they translate. For example, some people might not even know the importance of consistent usage of terminology. Instead of shutting those volunteers out by testing them in the beginning, arrange mini trainings when they use your tool to translate and familiarize them with this concept. This approach also resonates with the product concept we talked about earlier; it would motivate the crowd to participate if they can learn something new by offering their labor for free.

Another aspect that at first glance might seem irrelevant to quality is to make sure your technology is easy to use. For example, even if they learn the concept of consistent terminology, how they could put it in practice if they cannot find the button to store a word into the terminology list? Therefore, the ease of using the technology plays an important role in quality control.

During

Although it is not recommended to conduct linguistic tests at the “before" stage, it is not to say that there is no way to maintain linguistic quality. Voting mechanism can come in handy in this situation. Allowing volunteers to vote on translations provided by other volunteers makes sure that other people’s work is reviewed. This mechanism also provides one more way to contribute to the project. Additionally, this practice means that their hard work will be publicly shared, and they need to take it seriously in order to pass each other’s scrutiny.

Besides this, the best practices of providing context and answering questions for traditional translation projects are also applicable. After all, you cannot translate what you do not understand.

After

If you think the quality could be better and would like to add one more layer of check at this stage, you can do it by either hiring a professional reviewer or enlisting the crowd again.

The amount of work produced by your crowd is likely to be huge. It might not make business sense to hire a professional reviewer and review it all. However, you can do a sampling and only review part of it to gauge the quality.

You can also have reporting mechanism ready for your end-users to report bugs or issues they encounter; start with crowdsourcing and end with crowdsourcing. Be careful about this approach because the users might not appreciate a faulty release.

Summary and final thoughts

Let’s recap the important points about the best practices for quality and quantity when it comes to an unpaid translation crowdsourcing that would not have been completed.

  • Maintain quantity
    • a clear marketing mix
    • a highly motivated crowd
  • Maintain quality
    • before
      • provide training opportunities
      • have easy to use technology in place
    • during
      • establish a voting mechanism
      • provide context
      • establish a query management system
    • after
      • hire professional reviewers
      • establish a user reporting mechanism

Translation crowdsourcing projects are not all about getting translation for free and saving money. The efforts that are put into managing the projects, and the backlash from the public due to some companies’ for-profit business models have all shown how difficult it could be to successfully launch and finish a translation crowdsourcing project. If done right, however, it can be a tremendous boost for your entity.

Pictograph: A picture is worth a thousand words

Introduction

With the advancement of technology, people now can literally say a thousand words with a picture. Pictograph is an app that allows you to hide messages in a picture. All you need to do is uploading a picture into Pictograph and typing the message you would like to hide in the picture. Pictograph will encode the message for you. When your recipient receives the picture, they need to decode the picture using Pictograph in order to see the hidden message. Here is a demo of how the app works.

Summer Zhao, Star Tang, and I worked as a team to localize the app into Simplified Chinese using Xcode 11. Our target operating system is iOS. We planned to spend two weeks, starting from downloading the app from Github, and see how far we could go with this project. The following is how much we got done with the project.

Before Localization

We found this app on Github. However, before we could get it successfully to build in Xcode on our laptop, there were several steps to be completed to set the environment right. According to the error messages shown in Xcode when we first built it, it seemed like our laptops were missing things in order to run Pictograph in Xcode. Summer figured out that this had something to do with a folder called “Pods” inside the package we downloaded from Github. We needed to install CocoaPods inside our project folder. At the long last, we were able to run this app in Xcode.

Localization

Our first task was to localize the strings in the app. In order to do that, we needed to internationalize the strings. That is, wrapping the strings with a method so that they can be conveniently externalized for translation. Pictograph is written in Swift, and this is how to internationalize strings.

The first screenshot contains the string “Want to hide an image?” The second screenshot shows how it is internationalized. After making sure this method works, the rest of the task was a matter of finding all the user facing strings and wrapping them. Although this app appears to be simple, there are actually many strings shown in different places in the app. It involved a lot of trial and error to get all the strings internationalized. After doing that, we were able to export the strings and use Memsource to translate those externalized strings.

Background Color

After localizing the strings, we were looking for what could be changed as well. We did not like the glaring red background color, so we decided to change it. We located a line of code in a file called “PictographViewController.swift" where the background color was controlled. The original color was red as shown in the first screenshot. We chose blue as the new color, as shown in the second screenshot.

If you are an Apple product user, Dark Mode is no stranger to you. You can also change the color for Dark Mode if you like.

Icon Color

In order to be matched with the new background color, the icon needed to be changed too. Summer designed a new icon with blue as the main color. Then, we replaced the old icon with the new one.

Old icon
New icon

App Font

Since our localized app was going to be in Simplified Chinese, we wanted to make sure that the font we used for the app was Simplified Chinese characters friendly, so we located and imported a custom font, “SkyGarden." The following screenshots show the list of built-in fonts and the custom font we imported.

Built-in fonts
Custom font

The Simplified Chinese characters in the middle is the preview with the chosen font applied.

Things Unfinished

Although we were able to import and get a preview of the custom font, we were not able to apply the custom font to the app. After digging into the code of the app, we found that the font of the app is controlled by a line of code “UIFont.systemFont().” It seems like in order to apply our font, we would need to tell the app to look for our custom font and grab it for the interface. Because none of us had been exposed to Swift and also due to the difficulty we ran into for just getting the app to build in Xcode, we ran out of the time we set for ourselves in the beginning. Therefore, we did not make the custom font work in our app.

After Thoughts

We were so close to fully localize the app into Simplified Chinese. Here is the working version with the fonts not applied. With the winter breaking coming, I am confident I will be able to familiarize myself with Swift and get the custom font to work.

TLM-related Tools Exploration

Background

Although computer-assisted translation tools are commonly used in the translation and localization industry, they are not the only technology we should be familiar with. In the Advanced CAT class, we explored other technologies that can be used to boost our productivity. After dipping a toe into each technology, we completed a project for it to demonstrate our understanding.

Machine Translation: Microsoft Custom Translator

The first technology we explored was machine translation. Many people fear that machine translation will take over translators’ role in the near future. Therefore, the whole class was broken into groups to put Microsoft’s neural translation system to test with different domains.

Our Domain and Scope

We were aware that the more restricted the text is, the better the outcome of the machine translation will be. However, in order to know if the machine is capable of doing a human’s job, my group decided to go with news commentary, a text type that is not restricted in any way. Our goal was to train an engine to translate the New York Times economic and political news commentaries from English to simplified Chinese.

Pre-training Stage

Training data: News Commentary Parallel Corpus
Tuning data: human translated and manually aligned political and economic news commentaries from the New York Times.
Testing data: political and economic news commentaries from the New York Times.

We were not sure about the machine’s capability, so we set our initial goal to be producing understandable news commentaries, and this is our initial proposal for achieving the goal; we predicted that with a budget of $1040, the goal could be reached.

During Training

We found that it was easy for the engine to produce understandable articles. Therefore, we reset our goal to produce articles that resembled human translation. In other words, we were expecting high quality translations. This is our updated proposal.

After Training

We eventually stopped after 42 rounds of training, inputting a total of around 200 million characters in the engine. Our conclusion was it is extremely hard for an engine to produce quality translations on such an unrestricted text type. As shown in our presentation, the total amount of money spent was $3531.8 when we stopped the training.

After the project, I have a deeper understanding as to the capability of machines. With proper uses, machines can indeed save time and free up translators to carry out more sophisticated tasks. Nonetheless, the replacement of human by machines will not happen any time soon.

Trados QA Functionality: Regular Expression

QA is an important part of translation and localization projects. There are just too many aspects to pay attention to. If we can automate some of the processes, it would be beneficial.

Our second mini project was to explore regular expressions on regular expressions 101 and try to come up with specific rules that we could apply to our target language to speed up the QA process.

Here is the regular expressions we came up with for QAing translations from English to simplified Chinese and traditional Chinese. The screenshots showed that the regular expressions correctly identified errors that would have required human intervention.

If we can come up with comprehensive regular expressions for the target languages, the formatting check during a localization project cycle can be integrated into the editing stage.

Powerful Time Tracking Tool: Harvest

Keeping track of how much time we spend on projects can let us produce more precise estimates on future projects. As translation and localization professionals, people are certainly familiar with some free time tracking tools, such as TopTracker and Toggl. My final mini project is a demo of a powerful time tracking tool that includes not only the time tracking functionality but also expense tracking features that make invoicing easily. With this tool, a project manager can better organize the time and expenses.

Final Thoughts

We approached the translation and localization industry from many different aspects this semester, and that really broadens my horizons as to how complex this industry is. I will keep exploring new technologies so that I can keep up with the latest trends.

Subtitling Project

Background

We learned many techniques as well as tools over the semester in order to successfully handle DTP projects, and subtitling was one of our focuses. In class, we used VisualSubSync, HandBrake, and VLC media player to perform spotting, burning in and QA the subtitles. The guidelines we followed were Netflix’s timed text style guides. This subtitling project serves as a proof-of-concept purpose.

Project Setup

Style Guides

I again followed Netflix’s timed text style guides when creating subtitles. Those style guides can be found in the following links. Please note, SDH subtitles are not in the scope of this project.
1. Timed Text Style Guide: General Requirements
2. English Template Timed Text Style Guide
3. Simplified Chinese (PRC) Timed Text Style Guide

Tools

Transcribing and Spotting: Sublime

Burning subtitles: HandBrake and Adobe Premiere
Translating and Reviewing: memoQ

The Project

This project is to subtitle and translate a 2-min Chinese video into English.

Transcribing and Spotting

These two steps are done at the same time in Sublime.

The above is a glimpse of the Sublime interface. The lower-left panel is where subtitles can be added. Dragging and dropping each subtitle to the upper-right panel can create timecodes for them. The upper-left panel is the preview window to see the subtitle you just created in action. The lower-right panel is where project settings can adjusted, such as reading speed. Any subtitles that do not meet the requirements will be flagged in red in the Errors column in the lower-left panel.

According to Netflix’s guides, a subtitle should start and end within a certain number of frames of the audio so that the audience does not feel that the subtitles and the audio are not synced. Sublime is excellent for performing this job. It lets users move the subtitle one frame at a time. Together with the waveform, it is extremely easy to identify where to put the subtitle. As shown in the screenshot above, it is easy to place the subtitle 3 frames, recommended by Netflix, prior to the audio. After transcribing and spotting, the next step is to export the subtitles to be translated. Unfortunately, the Sublime version I am using is very old. It can only export the subtitles into an STL or SST format. Strangely enough, memoQ kept flagging the file with virus warnings and rejecting the file. Therefore, I ended up creating an SRT file myself.

Translating and Reviewing

These two steps are done in memoQ.

In memoQ, an SRT filter can be created. After entering the parameters, you can store the filter to be reused. The line length limit and character per second information above was adopted from Netflix’s guides mentioned earlier.

The above is what the memoQ video preview tool looks like. After translating each subtitle, you can see the real-time preview in a separate window. memoQ also does a real-time QA check for you, boxed in red in the lower-left corner, using the information entered in the SRT filter. Using the memoQ video preview tool lets you associate the project with your TM and TB if necessary.

Burning in Subtitles

I burned in the subtitles in in two ways: using HandBrake and Adobe Premiere Pro.

Using HandBrake to burn in subtitles is hassle-free. All I had to do was importing the SRT file and the source video to HandBrake. However, you do not have any control over where the subtitles appear and in what size. The above is a snapshot of the video produced from HandBrake.

However, in Premiere, subtitles can be moved around, and the size can be changed. As shown in the above screenshot, I can move the subtitle (boxed in red) down to avoid having subtitles covering the screen. The following is a snapshot of the video produced by Premiere.

The following links are the SRT file assets and the two videos with subtitles produced by HandBrake and Premiere.

Challenges

  1. There are many shot changes in the clip, and sometimes, they will limit how long a subtitle should be. Sublime does have a function called scene detection, but it is extremely inaccurate. Therefore, I had to manually identify where the shot change was and decide if a subtitle should stay across the change or not in order to minimize the effect on the audience experience.
  2. Creating an SRT file from scratch was time-consuming. Not being able to export an SRT file is the biggest problem of this version’s Sublime. The virus hiccup seemed to be a bug. This would have been resolved by updating Sublime if my version hadn’t been discontinued.
  3. There is a distinct appointment system in China when it comes to making an appointment with a doctor. Doctors are classified into different levels by their experience. The slots and money to make the appointment are therefore different. However, most people are still willing to pay for the highest level. In the video, the man wants to see the “best doctor" for curing headache. It means making an appointment with an “expert" level doctor. It was hard to explain this to audiences during the video who do not have this system in their country .

Final Thoughts

Subtitling is art and has very specific limits as to how many words you can use to preserve the original meaning to the greatest extent; Omission is not uncommon. A lot of times, there are trade-offs to be made. As a translator, finding the balance to not interrupt the viewing experience while preserving the meaning is important. This is why subtitling is fascinating.

TMS Comparison Projects

Project 1: TMSs comparison

In Translation Management System class, I was introduced to numerous translation management systems (TMSs), including SDL WorldServer, Lingotek, GlobalLink. While trying out these TMSs, I played different roles, such as project managers, translators, or clients, in order to understand how each stakeholder would be involved. Later, students would form groups and conduct a full analysis of chosen TMSs to compare their functionalities.

SDL WorldServer vs. GlobalLink

Our group chose WorldServer and GlobalLink, and we used a scorecard to calculate the pros and cons of each TMS. We first listed functions that we thought a TMS would or should have. Secondly, they were assigned different scores according to how important we see the functions are. We also noted the stakeholders to remind us who we would discuss with when identifying the needs for TMSs. As shown in the scorecard, there were twelve functions to be compared.

Final Scores

Since WorldServer and GlobalLink are major players in the localization industry, they all sufficed the functionalities we had come up with. The scores ended up with GlobalLink being only two points higher than WorldServer (82 vs. 80).

Project Material

Here is the slide deck of our analysis. Feel free to check it out.

Project 2: Consulting Project

Our final project of the class was searching a system for a real client to best work with the CAT tool they are using: Memsource. After having a kick-off meeting with the client, we narrowed the scope down to four possible solutions: Plunet, XTM, XTRF and SharePoint. The whole class then was divided into four groups, with each responsible for assessing one solution from six aspects: Client Management, Linguist, Vendor Management, Project Management, Cost (of product and migration), and Customer support.

Plunet with Memsource

My group was responsible for testing out the integration of Plunet and Memsource. The following is a screenshot of the overall criteria and scoring, Plunet’s scores are boxed in red.

Project Material

Here is the scorecard and slide deck that contain a full analysis of all four possible solutions. Feel free to check them out.

Skills I gained

  1. Adapting to a new translation/business management system based on the understanding of GlobalLink, SDL WroldServer, Lingotek, and Plunet.
  2. Creating different accounts attached to various rights groups for different stakeholders of a localization project using an admin account.
  3. Navigating a translation/business management system while wearing different hats.
  4. Creating a comprehensive scorecard to assess a TMS.
  5. Identifying stakeholders with different needs when assessing a TMS.