Ziqi Zhou

Never marry your ideas, flirt with them.

Category: Portfolio (page 1 of 2)

Language Industry in the Machine Learning Era: main technology, changing landscape, and new roles of stakeholders

 The development of machine learning has changed many industries, including the language industry. We ask questions like “will machine translation replace human translators” from time to time. But what machine learning can bring to the language industry is more than a question of whether human will be replaced or not. This essay will examine the technologies that greatly impact the language industry, what is being changed, and how industry stakeholders can better prepare themselves for the future. 

Machine Learning Technology Impacting the Language Industry 

The main technology that has already exerted a profound impact on the language industry is the natural language processing (NLP) technology. NLP is a vibrant interdisciplinary field with the goal of getting computers to perform useful tasks involving human language, tasks like enabling human-machine communication, improving human-human communication, or simply doing useful processing of text or speech. For a long time, we’ve been seeing examples of these tasks like conversational agents, grammar checker, and machine translation (MT). Although MT has been developing for over 60 years, it was not until recently that human feels the real threat because of the great progress in machine learning with many evidences showing the near-human quality of neuro machine translation (NMT) output. 

Besides NLP, machine learning can predict users’ preferences based on a large amount of data, enabling customized user experience, simpler and more automated workflow. All these contribute to the transformation of the language field by bringing in new players, more streamlined production process, larger user coverage, and smarter tools. 

Changing Landscape: New Players, Streamlined Workflow, and Smarter Tools 

The widely applied machine learning technology in the language field – machine translation (MT) – has penetrated into every corner of language service. Nowadays, not only language service providers are branding themselves as AI/MT solution provider, tech companies are leading the way of MT development. There are more MT vendors in the field: Google, Microsoft, Amazon, DeepL, Alibaba, Tencent, etc. MT engines are easier to train, users can customize their own MT engines using tools like AutoML Translation from Google or Microsoft Translator Hub. MT aggregators like Inten.to, translate4eu, are emerging at a rapid speed. And we also witnessed the born and growth of LPSs like Unbabel and Lilt, with MT as its DNA. 

In addition to the high-level changes mentioned above, the rapid progressing technology also changed the workflow of translation/localization projects and the interaction between enterprises and language service providers (LSP). Traditionally, the localization workflow on the client side goes like this: the localization team receives resource files to be localized from engineering team and delivers the files to vendors through translation management systems (TMS), after the translation has been sent back, the localization project manager will check these files for quality control and then send them back to the repository for testing or build. With the improvement of machine translation engines, more and more products are going through a MT + post editing (PE) process rather than the traditional translation > editing > proofreading (TEP) process. 

Apart from the changes in workflow, the scope of MT also grows. Companies like eBay, Adobe, and Microsoft have been relying on MT for making their products more accessible globally for a long time. Take eBay for example, in the past, raw MT was mainly applied for user-generated contents (UGC), like search queries, item titles and descriptions, product reviews and descriptions, or member-to-member communication (M2MT) tools. But now, eBay is employing MT for all eBay-created content such as help documentations and UI. Applying MT to all products has already become the localization strategy for more and more companies. Combined with data analysis, companies can localize their products in a smarter and scalable manner, for example they can launch web pages in many languages which are translated by machine and polish those with significantly growing page views. In this way, companies can greatly expand the coverage of target users without increasing much cost. 

Machine learning technology also changes tools used in the language industry. On one hand, CAT tools or TMS tools are becoming more intelligent and personalized. Role-based personalization can result in pre-defined UI layouts aligned to the needs and tasks of given user groups. On the individual level, experience can be changed based on users themselves by utilizing collected customer data. On the other hand, we can see a growing trend of the integration of AI engines and these tools. Currently, most CAT tools are not pre-installed with connectors to cloud based AI engines. But we also witnessed the evolvement of new connectors acting as middleware to a bunch of machine learning engines. 

New Roles of Stakeholders: Machine Learning for Efficiency, Humanity for Creativity 

Like other technologies bringing both innovation and some disruptiveness, machine learning technology may squeeze out some narrow job titles while creating new roles like NLP algorithm specialist, MT solution architect, etc. For stakeholders who are already in the industry, we need to adapt ourselves to the transforming industry. 

For linguists (translators and interpreters included), parts of our job have already been taken over by machines because of the progress of machine learning algorithms, text-to-speech (TTS) technology, automatic speech recognition (ASR) technology, etc. In some fields, like literature, transcreation, human translation still plays an important role. As a translator or interpreter, instead of worrying about being replaced by machines, we should embrace the technology and try to find ways to benefit from it. Also, it is advisable to learn skills like MT post editing, data analysis and the training of cloud-based AI engines. 

For PMs, no matter we’re with the client side or the LSP side, the continuously progressing technology will alter our responsibilities, workflow, even co-worker team. For example, the workflow of MT projects will be different from traditional projects, depending on conditions like MT resources, language resources, and client needs. Before a project is launched, scoping the work comes first: How many words need to be translated? Usually, they come in much larger volumes, like millions of words, than traditional translation projects. Will DTP, MT training, human editing be included? Which MT engine to employ? What’s the evaluation metrics for QA? Can we simply quote by number of words as we are not sure about the quality of MT’s initial output? 

The people to work with even changed when it comes to such projects. We may need a machine translation solution architect who has expertise in the process of an MT project and is familiar with MT tools, so he or she can pick out (or train) the best one for this specific project. We may even need to work with an NLP specialist or computational linguists for further developing our tools. Engineers may want to invest more time in connecting AI resources with CAT tools. Post-editors should be experienced linguists who are familiar with common error patterns of MT and know how to stick to the style guide. 

Given the above possible changes, the role of a PM may evolve towards a resource management direction, he or she must make sure that each person in the team can access the needed resource at the right time and can provide the right solution to clients by combining resources in an effective manner. The PM should also have a relatively strong technical background, and communication, organization and coordination skills, as an effective project plan needs the cooperation of so many parties. 

Developers’ tasks are also changing. By combining AI technologies with local resources like TM, terminology base, linguists, developers are able to build tools with automation plugins, MT pre- and post-processing features, customizable dashboards, etc. They must be aware of the state-of-art machine learning models being developed and know what are needed for their application and deployment. 

For the manager, we need to realistically think about the tasks that will be disappearing over the next few years and start planning for more meaningful, more valuable work that should replace it. Some easy but repetitive tasks are actually a subtle encouragement for people to make narrow and boring job contributions. Machines do not get frustrated or annoyed, and they certainly don’t imagine, they’re more efficient in handling such tasks. But we, as human beings we feel pain, we get frustrated. And it’s when we’re most annoyed and most curious that we’re motivated to dig into a problem and create change. Our imagination is the birth place of our new products, new services and even new industries. So, why not bring more humanity to the language industry? 

Machine learning has brought more possibilities and will definitely affect the language industry in many ways, however, it is important to note that, human efforts still play an important role in many scenarios and many locales where there’s a long way to go to educate people about language technology. Most importantly, humanity is what makes all these possible. 

Android App Localization: Automation with Plugins

Launching your Android App with support for different languages is essential for Apps going global. The internationalization and localization of Android Apps is easy when working with Android Studio.

Pre-localization

Before we get our App ready for localization, the first thing to do is internationalization. Basically, we need to externalize all the hard-coded UI strings into an xml file and put it under the “res/values” folder of our project. Next, we could create alternative strings.xml files, each stored in a locale-specific resource directory, based on our target languages. What we’ve done so far can be redeemed as the internationalization engineering part if our App is developed with all the strings hard coded. However, Android developers with internationalization mindset would not build the App this way but will put all the strings into the resource folder for future localization. More detailed information on how to do pre-localization engineering can be found in the Android Development Documentations.

Once we have finished the externalization, our App is ready for localization. Basically, there are two simple ways to do this. First, add new locales to externalized strings under the res > values folder and get these strings.xml files with locales transferred to translators or editors. The other way is localizing with plugins, which will be explained in detail in this blog post.

Why plugins?

Before we start our localization journey, I’d like to mention why localizing with plugins would be cool and in which way can it outperform the first localization method.

The best part about localization with a plugin is that localizers do not need to manually add the string files and send them out for translation. With a plugin installed, we can automate the process of translation and generating resource files within just a few clicks. The other cool aspect is that, with MT implemented with these plugins, users can sometimes customize the MT engine leveraged by the plugin, making the localization more relevant to a specific locale. Moreover, most plugins support a variety of locales which can meet your needs most of the time.

Tutorial

To illustrate how it works, I will use the Android App called Minimal Todo. It is a very light and useful app, allowing you to add todos easily and quickly. The code is very clean and well organized and has been partially localized. When we first downloaded the source file of this App, for localization purpose, we deleted most reference string files and de-localized it to make it more “raw”.

There is more than one plugin out there that can be used for localization purpose, in this case, I chose the one called “AndroidLocalize” and I’ll show you how to work with it in the video below.

Summary

As mentioned at the end of the video, this plugin is more like a black box by using which you cannot tell which MT engine it is using. This would be potentially problematic if you’re quite concerned about data security. Although there are plugins allowing configuration of MT engines, the basic idea is the same: leveraging MT for translation. This means, once the translation has been automatically completed and the App has gone through functional testing, the App cannot be released right away because of the possible issues with translation quality. By employing plugins during localization, we have to include human editors to control the final quality of translated strings.

Although we can see some downsides working with plugins, the advantages of a plugin like AndroidLocalize are obvious too. It helps to save the cost and time and removes many tedious steps in the localization process, such as copying string.xml files, adding locales to each file, translating manually, etc. It is so easy to pick up that even if you’re not an engineer you can still complete the task by yourself.

For the overall information of our Android App Localization project, please click here.

 

Embrace MT for a Zero-Barrier Future: MT Engines, MT’s Application, MTPMs for the future.

If 2009-2016 was the era of cloud and integration for language industry, year 2017 marked the burst of machine learning. The widely applied machine learning technology in the language field – machine translation (MT) – has penetrated into every corner of language service. Nowadays, however, not only language service providers are branding themselves as AI/MT solution provider, tech companies are leading the way of MT development.

There are more MT vendors in the field: Google, Microsoft, Amazon, DeepL, Alibaba, Tencent, etc. MT engines are easier to train, users can customize their own MT engines using tools like AutoML Translation from Google or Microsoft Translator Hub. MT aggregators like Inten.to, translate4eu, are emerging at a rapid speed. And we also witnessed the born and growth of LPSs like Unbabel and Lilt, with MT as its DNA.

Powered by technology growth, we can easily find MT use cases across such fields as eCommerce, travel, international laws, technical support, automated subtitling, new drug releases, scientific article releases, financial news and disclosure, etc.

In this blogpost, I’m going to give an overview of some big players in MT industry, present some examples of how MT can be applied to efficiently improve business profit, and consider how we can manage MT projects from the perspective of a project manager.

An Overview of MT Engines with Big Names

An Overview of MT Engines with Big Names

(Source: Intento. *There are more MT engines which are not listed here in the market.)

Among these MT engines, some of them are stock engines for deployment and leverage, some are customizable stock engines which allows users to customize their NMT for a specific domain using relevant corpora, others are customizable NMT engines which are built from users’ own corpora for their specific projects.

Given so many MT engines to choose from, you may be wondering which one is the best. Actually, there is NO BEST MT engine out there. Base on your own budget, language pair, turnaround time, and client needs, the standard for the “best” varies. To make this point more intuitive, let’s compare some of these engines in terms of price (pre-built engines as an example), language pair (Chinese as an example) and time needed to train or deploy.

Price

Cost of Different MT Engines (in USD/Million)

(Source: Inten.to)

Above chart shows the big variety of monthly price for translating with different MT engines according to statistics from Intento. Please note, price is per character sent to the API for processing, including whitespace characters. These engines also provide free translation options with limited time frame or total number of characters, but the chart only shows their regular pricing. Some engines, for example Google and Amazon, charge a universal price per million characters, others like Microsoft and Alibaba, charge differently based on the volume of characters.

Language Pair (Chinese as an example)

Before hiring any MT engine, one thing to keep in mind is that there is no best engine for all the language pairs. Different engines perform differently for different languages. Take Chinese as an example, guess which engine translate best? Google? Microsoft? Or Baidu? Tencent?

Wait, before you spit out the answer, ask again: “which language to Chinese?” or, “Chinese to which language?” Now you see the tricky part.

Each year, the Conference on Machine Translation (WMT) gives out training data for shared tasks and announces the winner of MT for each language pair. In 2018, Mr TranslatorTencent tops the list of Chinese to English news test with the highest BLEU score while GTCOM gains the first place in the Chinese to English translation.

Ranking of Systems for ZH > EN Translation

(Source: http://matrix.statmt.org/matrix/systems_list/1892?metric_id=4)

Ranking of Systems for EN>ZH Translation

(Source: http://matrix.statmt.org/matrix/systems_list/1893)

Yes, language pair matters. To give you a better idea, here are some best systems for news test 2018.

Best Systems for Different Language Pairs

(Source: http://matrix.statmt.org/?metric%5Bid%5D=5&mode=bestn)

Although Tencent’s MT engine seems to be best at handling ZH>EN translation tasks, it’s unadvisable to employ it right away. At least, we need to take into account the translation domain that you’re working on. The above table shows Tencent get the highest BLEU score in the news domain, what if you’re translating in another domain? Or is the BLEU score really reliable? Additionally, Tencent’s MT engine is NOT customizable. What if you want to tune your own MT engine, which MT engine should be used?

Time for Training and Deployment

For those who want to customize an NMT, time and cost devoted for training and deploying the engine should be considered when rolling out an MT solution.

Time to train custom NMT varies across these providers and some charge for training (Microsoft, Google) and some (IBM, Modern MT) don’t.

Time Needed for Training

(Source: Intento)

How MT Can Be Leveraged on the Enterprise Level

Talking about MT, even some translators/interpreters in the language industry would first think of throwing some source content into the web-based MT engine an copying its output back to desktop or any scenario where AI robots would probably (or not) take over the seat of an interpreter. However, when it comes to the enterprise level, MT shows great potential in minimizing the cost, international market expansion, and reshaping the workflow of translation/localization projects.

For example, cross-border eCommerce companies like eBay relies heavily on MT given the large volume of daily produced contents globally. It is just impossible to hire translators to translate all of them. With MT applied, a user can type in the name of an item in his/her own language and be given the result with names of goods from another country, although may be originally in a foreign language, in the local language.

Language service providers (LSP) are also hiring MT to lower the cost and improve the efficiency. Moreover, LSPs born with MT at its core emerges and are growing at a rapid speed. Take Unbabel as an example, this LSP uses MT for translation and contracts vendors for translating with both the original text and suggested translations produced by MT. They also hire freelance linguists for translation assessments, glossaries and thesaurus reviews, mainly working with language data. In its last funding series, Unbabel was backed by 7 investors with 23M USD raised.

Funding Rounds Info of Unbabel

(Source: Crunchbase)

Who are the Localization Project Managers in the MT Era?

Driven by advances in artificial intelligence, machine translation and emerging adaptive translation solutions continue to improve rapidly and are able to meet increasingly complex requirements. Anyone involved in language business must re-examine where and how these newer options can be leveraged. As a localization majored student who wishes to become a machine translation solution architect, I want to end this article by discussing some new requirements for project managers who want to leverage the power of machine translation (or MT PMs).

We need to keep in mind that the workflow of MT projects will be different from traditional projects, depending on conditions like MT resources, language resources, and client needs. Before a project is launched, scoping the work comes first: How many words need to be translated? Usually, they come in much larger volumes, like millions of words, than traditional translation projects. Will DTP, MT training, human editing be included? Which MT engine to employ? What’s the evaluation metrics for QA?

When it comes to quoting, can we simply quote by number of words? As we are not sure about the quality of MT’s initial output, we may not be able to know the time needed for post editing. Also, as the project goes, new discoveries or issues may appear, making it even uncertain for sending the quote to the client at first.

The people to work with even changed when it comes to MT projects. In the past, PM usually work with the account manager, engineer, and vendor manager but in a MT project team, we may need a machine translation solution architect who has expertise in the process of an MT project and is familiar with MT tools, so he/she can pick out (or train) the best one for this specific project. Engineers in MT projects may want to invest more time in connecting MT resources with CAT tools or Command Line in some cases. Also, different from traditional projects, MT projects will need more reliable post-editors if the bulky translation is done by machine. These post-editors should be experienced linguists who are familiar with the common error patterns of MT and know how to stick to the style guide.

Given the above possible changes, the PM should have both a relatively strong technical background, communication, organization and coordination skills, as an effective project plan needs the cooperation of so many parties. He/she must make sure that each person in the team can access the needed resource at the right time or step. Once needed, he/she should also be able to take over some technical configuration related to machine translation.

(This blogpost is inspired by Renato Beninatto’s lecture at IMUG on September 20, 2018. Special thanks to Inten.to!)

Achieving Growth in the Multicultural Marketplace: What to expect besides language adaptation?

For companies wishing to achieve sustaining growth in the global marketplace, language support alone is not enough for new market adoption. The solution is making the product geo-fit when globalizing it, which means to integrate the cultural and regional factors in your product experience strategy.

Only 5% of companies can sustain a growth globally. If we take a look at some big companies, we may find out this is never an easy process. It took Windows 25.8 years to reach 1 billion daily active users. What’s App spent 6.8 years to reach 1 billion daily active users, and Uber spent 9 years before operating in 65 countries. Along with the international expansion is the idea to develop the global-first product.

How to develop a global-first product?

We may first want to think about what we are optimizing for and how we define success. That is to say, keep the goal clear. First, regarding the market readiness, we want to establish solid new markets entry strategy by doing TAM analysis, evaluate the competitive landscape, look into potential local strategic partnerships, etc. Then, in terms of product readiness, we want to optimize product experience on key funnel, for example the onboarding and core engagement of users. Additionally, we also should take organization readiness into account. We need to figure out the international team positioning within the structure because horizontal cross-functional effort requires alignment on the corporation core objectives for effective impact execution.

As global product strategy is actually the local product strategy on a global scale, after clarifying our goal, we also need to know local customers and local market. To take local customers into account, switching from US-first to User-first is essential.

By saying that, let’s image we are users. When introduced to a foreign product, what are some questions we may ask ourselves before we move forward? We may first want to identify whether this is something we’ve already know. Then, we want to know if we can use it and whether we care about it. If we can use it and we do care, we may get interested. The user-first model requires such thinking from users’ perspective. However, there are some common misconceptions in global product strategy: One size fits all global product experience can be created; we can get the product right for EN_US first, then optimize for international; EN market is the core because that’s where the money is. Sadly, these are ALL WRONG. In fact, the emerging markets have grown 2-3 times the pace of developed markets and are expected to account for >50% of global GDP by 2030.

Cases: Integrating Regional & Cultural Factors

Now, you’ve made up your mind to put your product on the global scale and started mapping out your globalization strategy. Taking an international brand to a local market is complicated and needs coordination between many different parties. But you always want to integrate regional and cultural factors into this process to make your product more locally relevant. Different companies may have different approaches, but I’d like to share some cases that I really like and hopefully can give you some new ideas.

Chitu (赤兔):LinkedIn’s local App for Chinese users.

When LinkedIn decided to develop a brand new localized product to Chinese users, it grasped the idea that the importance of networking or “Guanxi” for career development is deeply rooted in Chinese culture. Inspired by the name of a horse of a Chinese ancient hero, the App brands itself as a social media targeting on young people without strong background, who mostly work at second-tier cities. Chitu knows what local customers want: they are eager to learn how to promote their career path and connect with each other.

Besides the difference of user targeting, Chitu involves more popular features like Live Mode and knowledge monetization.

It also launched three branding campaigns to establish trust from users. It also established partnership with local companies like Alibaba, allowing users link their Alibaba credit rating with LinkedIn profile.

Starbucks: Brewing brand trust in Mexico by hiring employees over 60

Starbucks Mexico opened its first shop staffed by people over the age of 60 this year. This not only contributed to the elderly community in Mexico but helped Starbucks gain local trust and good reputation.

Most of the times, the localization I’m exposed to is mostly about language localization, however in the there is a lot more to consider besides language: put user at the first place, integrate regional and cultural fact, and stay relevant and add value.

(This blogpost is inspired by Talia Baruch’s speech at Middlebury Institute on Thursday, September 13, 2018.)

Wordfast Localization 101: a presentation at 2018 Wordfast Forward User Conference

Software localization is tricky. Starting from this year, I have been working as the localization project management intern at Wordfast and managed several localization projects, including the localization of Wordfast Pro 5, the localization of Wordfast product brief, training video and website. Among these projects, the localization of Wordfast Pro is the most challenging.

Invited by John Di Rico, the Sales and Marketing Manager of Wordfast, I went to Cascais, Portugal as a speaker at the Wordfast Forward User Conference and made a presentation on the Wordfast Pro localization project. With this blog post, I’m going to share some of my own experience localizing the Wordfast Pro 5 and what I’ve learned from the trip to Portugal.

Workflow

The localization project can be divided into four steps, including getting the product localization ready(sometimes can be called internationalization), which was mostly done by developers, translation and review, localization quality assurance and product release. Since I was mainly involved in the translation and QA step, I’m going to explain more about these two steps which you will probably be doing if you are going to localize Wordfast Pro.

 

Translation & Review

The following picture shows the tools that we used. We use Wordfast Pro 5 as the CAT tool for translation and review, because you can get familiar with the software in a short time and refer to listed functions immediately. Brackets and Notepad++ are text editors for coding. They may help you locate the string and figure out its meaning.

Four json files will be given from Wordfast developers with the total word count of 21,275. These files can be imported into Wordfast, and you can start translating. We two split the task and each of us translated two files. Each segment is very short compared to what we normally translate, which mostly will be a sentence. It seems easier or at least faster translating short segments, but we did encountered many challenges.

Workbench in Wordfast Pro 5

 

Translation: Challenges

Challenges in the Translation Step

  1. Lack of Context

We can only get a list of segments which are really short. I still remember I was really frustrated when I saw an orphan word like an “of”.

Solution: When you’re not sure of its meaning, leave it blank, add a note so that later you can export the notes and send it to developers for feedback.

But what if you don’t want to wait?

Should I translate the “a” or just copy the source into target?

Solution: Use the code editor notepad++ or brackets to locate the term. Thanks to developers, the code is easy to read, so I can figure out , this segment is related to case sensitive, so I may leave it the way it is.

Locate the Segment Using Notepad ++

  1. Terminology

(1) Inconsistency of terms: Since the translation was done by different people, there were many terms ended up with different translations.

(2)  No corresponding Chinese term: There are some words that are hard to translate simply because there’re no corresponding Chinese terms that can express the same meaning in this UI context. I was struggling with terms like “penalty”, for the Chinese of this term means punishment.

Solution: We searched for existing terminology database for answers but couldn’t find satisfying results. What I did was to open the Chinese version of SDL Trados for reference.

3. Polysemy

There are many terms with more than one connotation. For example, without any context, “mouse” could refer to an animal or a device we use to move the cursor.

“Check”: To verify, make sure something is correct? Tick a box when you’re selecting something?

  1. Tags & Word Order:  There are a lot of tags in the segments. There is a tag at the end of the sentence below, you don’t know what exactly will appear in this place holder. Most importantly, you don’t want to move the tag away from its original position. Because moving tags may cause a concatenation issue, I would not want developers to rewrite the code because of my translation.

    Tag in a Segment

    Solution: When the translation seems need to move tags around,  we translators should really use our brain to come up with a better and safer way to translate.

Translation Using a Word Order Friendly to Tags

Normally, if I translate this sentence into Chinese, it would be like the upper left one. Although you don’t understand Chinese, from the arrow lines, you can see how the Chinese word order differs from the English one and you need to move the tag into the middle of the sentence. You don’t want it to happen. So, in this case, we should really use our brain to come up with a translation that keeps the same word order. What’s shown in the upper right was how I finally translated this segment. You can see, the word orders stay almost the same, and you don’t need to move tags.

5. String Length

We need to pay attention to the length of your translation. Chinese is a very compact language, however, if you want to localize Wordfast into European languages, eg. German, make sure it would not be too long since we don’t want the developers to rewrite the codes for the software UI.

Localization Quality Assurance (QA):

The localization QA can make the final product really shine, personally, I think it’s as important as the translation process. In the past, when much of translation was focused on documentation, final reviews were typically completed by translators just before the files were sent to publishing. With the rise of digital content and localization activities for mobile applications, software and websites, localization QA became a necessity in the translation process and ensures the quality of the localized product.

Usually, a QA tester will analyze the product in three ways:

Localization QA

Linguistic Testing:

Accuracy of translation within context – Some words/sentences may need to be translated differently depending on their usage.

Consistency of terminology – “Submit” or “Send”? While these two words are quite similar, they can confuse a user if they are used inconsistently. (Especially the translation is done by more than 1 person, solution: term base, glossary, TM)

Missing content – Engineers who build a localized product most likely won’t speak the target language. If they miss part of the text, they’ll never know it until someone who can understand the language notices. Testers make sure that this “someone” isn’t the end user!

Format and Layout Testing:

Consistency with the source – Is everything laid out properly? Is there any truncated text? Misplaced line breaks?

Images – Are the images localized properly? Culturally appropriate? Remember the famous Microsoft photoshop slip?  Testing is your chance to avoid similar disasters.

Proper character display – It’s not uncommon to see corrupted characters in localized products. Typically, they’re displayed in the shape of empty boxes or question marks, but in some languages, such as Arabic and Vietnamese, it’s almost impossible to detect corrupted letters if you don’t speak the language.

Functional Testing:

Links – Do the links within the localized content point to correct pages?

Behavior – Is the application behaving as it is supposed to?

Input/output validation – Do the forms allow target language characters to be input? Are the error messages localized properly? What about the postal code?

Usually, the LQA process consists of four parts.

LQA Process

But for localizing WordFast, based on my personal experience, you will be mostly looking at the following aspects.

  1. Linguistic testing:
  • Accuracy of translation within context
  • Consistency of terminology
  • Untranslated Strings

Untranslated Strings

  1. Format and Layout Testing:
  • Typos, punctuation, format, spaces, line breaks
  • Truncations, text expansion

Punctuation Should Also Be Localized (“:” to “:”; “.” to “。”)

  1. Functional Testing:
  • Link, button, menu functionality: make sure each functionality works and they will not direct you to a page in English.
  • Character display: No corrupted or shown as a missing blank tofu character.
  • Error messages: Error messages should all be localized. In this project, we failed to do this because we didn’t receive the file containing strings from error messages.

When doing LQA, we found out that by clicking TransCheck function, a report will automatically be generated but it’s all in English.

 

English String in Error Message

Localization Quality Assurance: Challenges

No Existing Test Script

This is the 1st and biggest difficulty: we are the first who did the localization of Wordfast Pro, so there was no existing test script that we could follow to conduct the LQA.

Solution: The user guide of Wordfast Pro 5 was very helpful because it outlines may necessary steps when using the software. We also simulated a translation project from the very beginning to the end and went over each function tab by tab.

Terminology  Consistency

Like the translation step, LQA also needs to deal with the inconsistency issue. Since we only have two people involved in this step, two of us did the QA together.

Locating Strings in Source Files

It’s not realistic to go back and forth with developers whenever you run into a term and wait for their response, especially when you have a short turnaround time.

Solution: Use code text editors like Brackets or Notepad ++ and find the term in the editor. Thanks to the developers, the code looks clean and easy to read. You will get a hint by looking at these codes.

Best Practices

Based on my own experience, I’d like to share the following best practices with those who also want to localize a software like Wordfast Pro.

  1. Create a glossary and add terms into it from the very beginning.
  2. Add notes while translating or doing QA.
  3. Use Wordfast as the CAT tool.
  4. Update the TM frequently while translating collaboratively.
  5. Document every bug you find with a screenshot.

Demo

All the localized products have been put onto our localized website, please click here to view or download the localized Wordfast Pro 5.

Last but Not Least

Visiting Europe never came to my mind until I received the invitation message from John in February. Traveling alone in a non-English speaking country and presenting in front of a group of international audience who are mostly very experienced translators sound challenging. But if you also get a chance like this, I would say, go for it, because what you can get would be more rewarding than you may expect.

Lisboa

Creating a Localized Poster for Fortnite Using PhotoShop

Fortnite is a game developed by Epic Games and was officially introduced to mainland China on April 23th, 2018 by Tencent. Thus, the game needs a lot of localization work to make sure it is well adapted to Chinese culture with well translated UI strings, understandable names of heroes and items, and locally relevant videos or graphics.

Interested in this game, I decided to create a localized poster for it using Photoshop. The original poster shown below is quite simple: there are several heroes in the poster, the only text needs to be translated is the name of the game, and it has a clean background. Most of the times, when we talk about localizing something by doing desktop publishing, we are aimed at translating all the text in it without losing its original style or format. However, in this case, simply translating the text is far from what can be called “a localized poster”. So, I decided to recreate one based on the existing poster and make it more locally relevant, friendly and attractive to Chinese users.

(Original Poster)

Before I started everything, I wanted to be clear of my goals, which include adding in Chinese style roles, background and other elements.

Workflow

Step 1: Resource material collection

  1. Separate all the heroes so that I can move them around.

(png Files for Each Hero)

2. Add in a hero with a Chinese style outfit (Wukong outfit). Everyone in China knows Wukong, the Monkey King in the story “Journey to the West”.

(Wukong Outfit)

3. Add in a Chinese dragon, which is actually a glider in the game used by players to land onto the battle field.

(Royale Dragon Glider)

4. Create a background with Chinese characters “你好”, which literally means “hello” in Chinese. These two characters are made up by the forts in the game.

(Background Picture)

Step 2: Process pictures in PhotoShop

  1. Resize each picture and group them together in a V line, with the Wukong figure in the center so that it can be emphasized.

(The New Squad)

  1. Remove the background color of the Chinese dragon glider using the Magic Eraser tool and flip the canvas horizontally.

(Dragon Glider with Transparent Background)

3.Capture a screenshot from an advertising video of Fortnite posted on its Weibo account, clean the screenshot and resize the picture

(Screenshot from the Video)

4. Translate the text “Fortnite”.

Although there was only one word that needed translation. It did took me a long time to figure out how I can properly output the Chinese text into the right font. The problem is, I couldn’t find the same font with the game in PhotoShop. Then I decided to get a picture with the official Chinese name of the game and use it to replace the original logo.

(Bilingual Logo)

5. Put every layer together onto one canvas and make adjustments.

(Final Product)

Challenges and Solutions

  1. Resource Material Collection

Resource material collection is the main challenge I’ve encountered in the whole process. I need to get the png file for each character in the poster so that I can reposition them. Although I can find these resources from Google Images by searching “Fortnite characters transparent”, the resolution and the size of these pictures are different. Once I gathered them onto one canvas, they didn’t appear like a whole due to resolution difference.

Solution: Resize each picture carefully so that they look the same big when being put together.  Use the sharpen tool to sharpen the blurry layers to make each hero in the squad stays in the same level of definition.

Another challenge was the lack of ideal resource materials. Most of the time, I need to recreate what I’ve got and then use them for my final product.

For example, it took me a long time to find a satisfactory background picture. As I was exploring every post on Fortnite’s official Weibo page, I was impressed by its advertising video for Chinese market and decided to capture a frame from it. It was not hard to do a screenshot, but some work needs to be done for cleaning up the screenshot.

To remove the game logo and irrelevant text, I used the Content-Aware Fill tool under the Edit tab of PhotoShop.

  1. Font, font, font ……

As I mentioned above, I had trouble finding the same font used by Tencent or Epic. The walkaround solution I took was using another picture with the official Chinese translation text and making it into a layer in my final product. But since it is not a real text layer, this would be problematic if there’s a huge amount of text or the file needs to be recreated in InDesign.

In real life, however, if you are a member of the localization team of Fortnite, you probably can access to their resource library and get the font.

Conclusion

The recreation of the poster is not a difficult task and doesn’t need a lot of high-level skills in PhotoShop. But compared to the original one, the final poster looks more relevant to the Chinese market. As a Chinese player, I will appreciate Fotnite’s devotion in adapting its products for a specific market. As a localizer, I want to prove, by doing this project, that localization is much more than translation, it needs creativity, familiarity with the product, and attention to details from the localizer.

Customizing your own machine translation engine: an MT training project using Microsoft Translator Hub

Introduction: This blog post is about a SMT training project using Microsoft Translator Hub. In this portfolio, you will read about the description of the project, lessons learned from the project, my ideas on how we can improve translation quality by leveraging the power of MT, and my experiment of other techniques to improve the overall localization workflow.

SMT Training Project: Developing a MT engine for translating Chinese Government Work Report

Project Overview: This pilot project is designed to establish a Statistical Machine Translation (SMT) engine from English to Chinese in order to support the Chinese government initiative to provide public documentation in both Chinese and English. Our data source will be extracted from Chinese government public report websites.

In order to be considered “fully trained”, the post-edited machine translations (PEMT) from this engine must meet the following target criteria for efficiency, cost savings and quality:

  • Efficiency: PEMT 20% faster than human translation
  • Cost: PEMT 25% savings over human translation
  • Quality: PEMT with an acceptable score of less than 30 based on the Multidimensional Quality Metrics (MQM). The acceptable score was increased from 10 points to 30 points due the length and complexity of the product review compare to the government reports.

A total of 13 rounds of successful training were completed with an initial data set of approximately 56,916 segments used as training data, approximately 1,270 segments used as tuning data, and approximately 1,559 segments used as testing data. A BLEU score will be given after each round of training by the Microsoft Translation Hub. The initial round of training achieved a BLEU score of 12.6 and over the next three weeks, 12 more rounds of training were done with a result of 18.61 (~48%) as the best BLEU score achieved. Finally, the MT system with the highest BLEU score was deployed and the translation of this system was compared with the human translation team to reach our final conclusion of this project.

Data: If you feed the machine with garbage, your machine will output garbage. To maximize the quality of the input data, the official bilingual files from Chinese government were used and each file was carefully segmented and aligned using CAT tools.

  1. Timeline

*GWR: Government Work Report; WP: White Paper

Date Task Duration
04/02 – 04/03 Project Planning 3 hours
04/04 Data Collection: 3 TMX (GWR 16-18) + 2 TMX (WP) 9 hours
04/05 Data Collection: 3 TMX (GWR 13-15) + 2 TMX (OPUS and WP) 6 hours
04/06 Realign Tuning Data: GWR 13/15/17 3 hours
04/08 Data Collection: 2 TMX (GWR 11/12) + 3 TMX (WP) 5.5 hours
04/10 Data Collection: 3 TMX 0.5 hour
04/11 Data Collection: 4 TMX 1 hours
04/12 Data Collection: 8 DOC (GWR Monolingual) 1 hour
04/13 Refine Training 1 hour
04/15 Deploy System 0.5 hour

 

2.Conclusion

A 500-word excerpt from Government Work Report was used to test the deployed engine. The MT output was post-edited by editors and errors from the output was identified. On the other side, the same file was human-translated. After comparing the time, quality and cost, we reached the following conclusion.

 

Based on 500 words Rate Saving Post Edit Time Time Save
Goal:

Price: 20%

Time: 25%

HTEP $0.25  

20%

60 min  

25%

PEMT+EP $0.20 30 min
 

Final Result

HTEP $0.25  

    0%

60 min  

0%

PEMT+EP $0.25 60 min+

(Comparison of errors found in HT and SMT using MQM as the criteria)

The outcome of our SMT did not meet any of our goals for this project. Nevertheless, the team has put tremendous effort into the allotted time and completed the entire process from initial planning to deploying the trained SMT.

Lessons Learned

The experience and knowledge gained from this pilot project resulted in the analysis to accurately project the requirements to fully train this SMT to support the government work report initiative.

Training Data Requirement:

Based on our pilot project result, we project at least 1,000,000 segments are required to fully train the SMT engine. The major quality issue resulted from our SMT engine are grammar, more specifically sentence structure problem. We conclude that increasing the quantity of training data and tuning data will significantly improve the grammar quality of the SMT engine.

Additionally, well-aligned source documents are key to the success of the SMT training. We only accepted the official government translations and conducted multiple alignment and reviews of our data. The result was less quantity but better quality, the result was evident in our 9th round training result. Our BLEU score increased from 13.4 to 18.6. We believe that adding a dictionary and monolingual data would have significantly improved our BLEU score and the overall quality of our SMT engine.

Time and Cost:

We project 783 hours and about $35,000 are required to fully train the SMT engine. A team of at least 4 full-time members would ensure the SMT can be fully trained to meet the goals. Our team significantly increased the rate of SMT training rounds after the fourth round, which has helped us quickly identify a deficiency and the direction we should focus on.

Tools:

Initially, our team only used the TMXMall as our primary tool for favoring its simplicity and accessibility. But, quickly we found the quality of our aligned documents have suffered from our tool or choice (see first four rounds). By implementing TRADOS Studio and Okapi Olifant, we were able to align all of our documents into sentence segments and drastically improved the quality.

In conclusion, we do not recommend using the SMT instead of the human translation for the government work report initiative. We believe, with the proposed plan outlined above, the fully trained SMT engine would meet the established goals in order to be utilized for future government work report initiative.

Meet the future: Customizing a Neural Machine Translation engine?

When identifying error patterns of MT outputs, we captured some causes of these errors. Word order is a typical error of the MT output. The MT system tends to translate word by word, which ignores the fact that following the same word sequence in the target language may actually doesn’t make any sense. This can be partly improved if we create a dictionary or term database that is linked to this system. But the NMT system may be better at delivering a more accurate translation as NMT better captures the context of full sentences before translating them.

By the time we finished this project, Microsoft has not opened its NMT system for customized training. But the good new is, THEY JUST DID THIS a week ago. Click here to learn more.

Personally, I’m quite interested in training an NMT engine and can’t wait to see the results. What I do believe is that no matter how unsatisfied we are with the output by existing MT, being able to utilize various techniques is a basic skill for any ambitious translator.

Technology is not always about some unreadable codes or algorithms, sometimes utilizing a simple tool like this will make your translation work much easier.

https://youtu.be/hau8DHpKoVY

There’s much more to be explored. Always being curious, eager to try, and open to latest technology is perhaps something most valuable I get from the CAT course at Middlebury Institute. Thank you, professor Adam Wooten.

(You can access the files of our SMT project here.)

Experience with TMS: lessons learned and a look into the future

Introduction: This portfolio is a summary of what we’ve learned from the Translation Management System (TMS) course. It includes the description of a group project I’ve participated in, lessons learned through the project, and my personal ideas on how I would like to design a different translation management system.

The course covers general concepts behind TMS software. Using the SDL WorldServer web-based TMS, students were able to explore the functions and features of a translation management system from the point of view of a translator, project manager and administrator.

A pilot translation project for Foreign Affairs Journal

Basic Information

Client: Foreign Affairs Journal

Language Service Provider: Fish&Chips Localization.Inc

Source Files: Abstract of three articles posted on the website of Foreign Affairs Journal

Source Language: English(US)

Target Language: Chinese(ZH); Spanish(ES)

Word Count: 905(en-zh); 289(en-es)

Translation Management System: SDL WorldServer

The Use of WorldServer in This Project

WorldServer has been applied to each phase of the project, from preparation to finalization. For the preparation, WorldServer was used to create locales, workflow, workgroup, cost model, quality model, translation memories (TM), TM group, term database (TD), TD group, and project.

For the production, WorldServer was used to assign translation files to translators, translate files, and do quality assurance. For finalization, WorldServer was used to deliver products, inform the client and update TM and TD.

Obviously, with a TMS, the project workflow can be automated once the project is launched, all the related parties of this project can check in at any time to view or submit tasks, and relevant data can be updated in real time.

While using a TMS like WorldServer can bring the benefits listed above, not every step went smoothly as we were trying to incorporate WorldServer with our project.

Challenges and Solutions

  1. Project Creation

Launching a project needs a series of steps, which can be buggy since the localization engineer needs to set up everything from scratch without forgetting any tiny step or ignoring the sequence of these steps. When our team has set up everything and finally reached the project creation step, we got the following message:

(Error Message from WorldServer)

To fix this, we first thought there might be some errors in the setting of workflow. Did we forget to add assignee when we were setting up the workflow?

Then we added a new workflow role (see the screenshot below), but the creation of our project still failed. We were guessing, there must be somewhere in the workflow that allows us to change the setting of assignee.

(Setting New Workflow Role)

It turned out that when setting up the workflow, in the human step, i.e. translate/review, engineers can set the properties including who should be assigned to this step.

(Setting Human Step Properties)

It took our group a long time to figure this out, since we could not find much clue about this from WorldServer itself. For any project, large or small, spending much time on fixing bugs can be frustrating, which may arouse a question: do we need to use a TMS for a small and urgent project?

  1. Translation and Review

If a translator logs onto WorldServer, he/she can claim the assigned task, and start translation using WorldServer’s workbench. Translators can click on the “complete” button after finishing translation and the file will automatically go to reviewers’ task list. Similarly, reviewers can start QA checking using WorldServer’s workbench and deliver the file after completion. In WorldServer, the interface of translation workbench and the review one are the same, and once click on the “complete” button, translators and reviewers can no longer see the tasks through their portal.

We believe these may cause some troubles when translators or reviewers want to double check their work or track their task history. The reviewer’s workbench can also be problematic since QA check is different from translation. Features like adding notes, tracking changes, giving scores, identifying error patterns may be more helpful to a reviewer than the current feature, which is editing.

Reflection

What to consider when choosing a TMS?

Questions like “Should we adopt a TMS for your project/company?” “Which TMS should I choose” really don’t have exact answers. Based on the readings I’ve done this semester, there are several aspects you may consider when selecting a TMS.

(Thing to consider before choosing a TMS)

How would I like to design a TMS?

So far, I have worked with such TMS tools as WorldServer, LingoTek and XTRF. Some CAT tools, for example Trados and Wordfast, also have features related to translation management as they can perform tasks like automated file passing, and TM sharing.

These TMS tools have many great features such as access to server-based translation memories, terminology repositories and portals, customization and automation of translation workflows. But I’ve also discovered some defects as I’m using them.

Besides the issues mentioned above, WorldServer requires a dedicated installation environment and is very processor-intensive. It also has a limitation on supported web browsers and java versions. Additionally, WorldServer stores its data in an SQL database, which requires plenty of memories in a computer and data manipulation skills for people working with WorldServer. So, using such tools requires a huge amount of resource, including resource spent in educating people working with them.

TMS tools are mostly used by project managers in translation/localization industry, as it’s powerful in streamlining the workflow of a complex translation project. But vendors, clients, even developers can get involved in these projects. Many tools provide portals for vendors or clients, but few provides portals for developers. As you can tell from our pilot project, TMS like WorldServer works for processing such non-technical source files, as such localization does not involve developers.

The fact is, however, many products to be localized are websites, APPs, and software. Internationalization engineers need to get strings ready for translation separately. The exclusion of developers may make them unfamiliar with workflows of a localization project, and lead to a more tedious and disjointed cross-team cooperation and communication. Therefore, a developer-friendly TMS offering various APIs for developers to leverage would greatly benefit the seamless localization of a product.

TMS tools today are mostly set up on computers, not mobile devices. As tracking project progress and communication are important to project managers, it would be more convenient if there’s simplified version of TMS APP featuring tracking data, viewing progress, receiving instant notification, sending messages or even making phone calls. This should be helpful for project managers managing a team across time zones.

The world is transforming from mobile first to AI first, TMS tools need to embrace AI. Although some TMS tools have already made it possible for integrating various machine translation APIs, leveraging the power of AI is more than that. For example, an AI-driven TMS can make more accurate analysis and prediction, users will get a clear view of clients’ preferences, vendor’s strengths or weaknesses, thus, greatly improve the decision-making process.

(You can access our project files from here.)

Localizing for Chinese Market: Without Access to G Suite, What Can We Do?

Recently, I’ve been working with Jeannette and Brenda on building the volunteer framework for Translation Commons(TC). This is a digital platform for translators, where they can get free access to computer assisted translation (CAT) tools, MT engines, and localization tools. It also provides learning resources and has an online community. I was excited to see such a great hub for people in the translation industry and wanted to bring it to China. In China, students majored in translation/interpretation learn little about MT, translation management system (TMS), localization workflow or best practices. For them, learning translation or interpretation means you are good at another foreign language and can envision yourself as a translator or interpreter.  And even some top language institutes in China show little interest in teaching students CAT tools, arguing these tools are too expensive. But we will finally embrace the tech-shaped future. So, I went up to Jeannette on an IMUG event and said : “I’d like to bring TC to more users in China, is there anything I can do to help?”

Jeannette then became my mentor, and building a volunteer framework has become our first thing to do.  We realized a problem as we want to promote TC among Chinese top language schools: TC is integrated with Google Suite, but G suite could not be accessed in China. Imagine that a Chinese user does a simple click on his/her dashboard and get 404 shown in the browser, or even before he/she clicks on these little icons, he/she is wondering:”What (the heck) are these?”

(Dashboard UI)

Definitely, it will result in bad user experience. Users, thinking ” I don’t know what I can do”, will soon lose interest. But I also want to ask: “What can we do?”

I talked to my friend, a principal engineer at google, asking if it’s possible to access to google suite through a third party or without using VPN, he said:”it depends but usually not”, and commented : “why don’t you fully localize your product? If you want to enter Chinese market, you need to bring a whole package of accessible toolkit!”  So, here brings out our first possible solution.

  1. Integrate accessible worktools. While products like Microsoft Outlook, Calendar can act as the counterparts for some google products, please wait a second before you put them on board. Although Microsoft Outlook is not fully blocked in China, do Chinese people really (like to) use it?
  2. Cooperate with local companies. Some IT companies like Youdao, a subsidiary of NetEase, also enable users to work collaboratively and share data on the cloud, all using one email account from NetEase (in Youdao’s case).  The problem is, there is no company like Google that can provide such a whole set of free tools. Paying is painful, especially for NGOs.
  3. Create a new Chinese web page. Adding a Chinese web page to the website may be a considerable choice since we can direct specific users in a flexible way. On this page, we may not put in G suite, YouTube videos, but we can always put in other accessible resources.

(This blog post will be modified/updated several times as I continue working on this project. For now, I haven’t come up with any effective solution yet. I’m open to your suggestions!)

 

Doing l10n? Let’s talk about something non-technical

Lessons learned from Marriott’s PR crisis in mainland China

Since I started my translation and localization program at MIIS, I found that we put so much emphasis in technical side: we’ve learned how to do localization project management by using tools like XTRF, we’ve learned HTML, CSS, and JavaScript for localizing a website, we’ve also learned how to use various CAT tools. We pay attention to details like concatenation of strings, setting language pickers while abandoning using flags, distributing tasks through automation… All these techniques can help us deliver a localized product in a more efficient way. And we all know the importance of the quality of translation. But what else? Are the accurate translation, perfect product design, and bug-zero UI enough for a safe landing of a product in another culture? Probably not. Culture awareness or even political awareness could play a big role.

Recently, the Marriott International was reprimanded by China for calling its territories independent in a questionnaire for members of its rewards program on its Chinese website. (For details, click here)

Chinese government has requested the deletion of related contents, suspended Marriott International’s Chinese website for a week, and started reviewing all the contents released on Marriott’s APPs and webpages. According to the government, Marriott was involved in the violation against China’s Cyber Security Law and Advertising Law.

Marriott Rewards has published a statement confirming its stand of not supporting separatists, and made an apology on Weibo, one of the most popular SNS in China. However, at the same time, it liked a tweet published by a political group about thanking Marriott for listing Tibet as a country together with Hongkong and Taiwan. This has made its previous public apology a joke and further negatively impacted its corporation image in mainland China.

I don’t want to go further in explaining the Chinese territory or talk about the cross-trait relation between Beijing and Taipei, but I need to say that when it comes to localization, the ignorance of local culture, law, and history is a taboo for companies that are going global. Obviously, the localization of Marriott’s website and APPs in China is a failure. What leads to this?

Thinking from a localization related prospective, maybe the translators or LSPs hired by Marriott are not qualified enough, or maybe they used machine translation. Localization is not about translating a word into a word, it’s about adapting a word according to the local culture and market. At least, these people have paid little attention to local culture and have no idea of what it means to Chinese people when it comes to territorial integrity. But it is unfair just criticizing the localization team, since they are not responsible for liking a tweet. Probably the bad localization itself was just an extension of Marriott’s political ideas.

How to avoid similar incidents from happening again? I’d like to share some ideas from a localization practitioner’s view.

  1. When hiring translators for localizing  contents targeted at a specific market, remember:

Native speakers with living experience in that country > native speakers; non-native speakers with living experience in that country > non-native speakers without living experience in that country.

2. When applying machine translation, always involve a human translator who can do proofreading.

3. Some companies completely outsource their localization work, some companies have their own localization teams. For companies with localization teams, hiring project managers with multi-culture background and objective political views is essential. For LSPs, diversifying the knowledge background of your vendors can sometimes save your clients from getting involved in big trouble.

4. Experienced translators are usually very sensitive with cultural differences and they can handle them by using flexible translation tricks. For example, when translating the word “country” into Chinese, instead of using the word “国家”, we can use “国家或地区”, meaning a country or a region.

Every country has its own culture, law or regulation, it is always safe to double check if any of your contents have conflicts with them before final shipment.

 

 

 

Older posts

© 2025 Ziqi Zhou

Theme by Anders NorenUp ↑