Testing Automation with Python

It’s no mystery that automation is now essential to a localization workflow that can keep pace with today’s gargantuan influx of source content. And in an era where working from home is now a requirement, automation can be a crucial tool to mitigate the stress of managing a workflow remotely. And among the automated solutions in the industry today, there seems to be a gap for Python solutions. In the spirit of these ideas, two of my colleagues and I collaborated in our spare time to develop and test a short, but effective, python script designed to pull strings straight from HTML files and put them in a document for translation. With a command from the command-line, a translation-ready source document is created alongside a javascript file that, together, form a localization solution for small websites. I’ll breakdown our script piece by piece and showcase the methods we utilized. 

Let’s start with the most important piece of the pie: Beautiful Soup. Beautiful Soup is a Python library built specifically to enable and facilitate the access and modification of HTML or XML parse trees. The so-called Document Object Model (DOM) that HTML employs in its basic structure is baked right into Beautiful Soup, making it ideal for searching through. Our initial research uncovered this gem early on, and we hope to spread its popularity through evangelization. It’s not an exaggeration to say that this library is the crux of our script. It provided us the essential tools necessary to extract strings of text from HTML docs. 

An instantiation of a “BeautifulSoup” object.

In particular, the standard (yet so much more than “standard”) “get_text” function allowed us to retrieve every string and place them into a text file. With just a simple command, a translation ready document is generated. However, this by itself is not the real beauty of this script. It’s the auto-generated javascript that truly makes this shine.  In fact, this portion of our script really showcases the power of Python with its ability to auto-generate script files. For our pilot example, we generated a file for a solution titled “24 Ways” that serves as a convenient centralized localization solution for small and/or static websites. The core premise is that a javascript file contains all the strings from the original document in pairs with their translated counterparts (the key or source string, and value or target string). And each language has a key/value pair “Strings.js” file. The pairs of strings are governed by a function that is placed around each individual string in its original file so that when the function is called, the original string is passed to the key/value file to match it with its correct translation. The translation then dynamically replaces the original source string. This is important primarily for strings that appear in javascript elements in HTML files (which is particularly prevalent in browser games). 

The creation of our “24 Ways” solution.

In essence, then, our script not only provides a document containing the text from the HTML document to be translated, but also generates a key/value pair “Strings.js” file to be translated (NOTE: to prepare a “Strings.js” file for translation, it should be converted ideally to a Word document and have all text hidden except for the value strings). Thus, this solution includes a separate javascript and HTML file for each language. Modifying or updating strings becomes much less of a chore when all of the strings are stored discreetly in these files. But adding new strings will require additional functionality and further tweaking of our script. 

The point of this script is twofold: to showcase the power and efficiency of using Python in a localization workflow; and to form the root of a whole range of Python scripts to be developed for localization to fill in important gaps in the industry. The next projects up for consideration include a script for maintaining an existing localization solution automatically and a script focusing on internationalization issues in javascript or typescript (a broader version of javascript, essentially).

Below is a link to view and/or download the full script:

https://drive.google.com/file/d/13iZKh43jElcHOVnCNoRO0PIvbFdQbwEM/view?usp=sharing

Terminology Management: My Experiences

May 16, 2019

Before I began the Spring semester of 2019 at the Middlebury Institute of International Studies (MIIS), I generally had little to no experience either understanding or managing terminology. In that regard, I’d like to present my major takeaways from the Terminology Management course to demonstrate my new understanding of the subject field. The introductory ideas that we covered were initially alien to me, because concepts or designations weren’t the types of terms I’d really devoted any thought to prior to taking this course. And for good reason: they’re abstract terms. Regarding them with scrutiny seemed almost absurd. And yet, it can’t be understated how crucial they are to terminology management since they form the basis of knowledge for any term. Specifically I’m referring to the so-called “Triangle of Reference”.

Triangle of Reference

To explain, a concept is an abstract idea that is linked to a physical object and labelled with a designation. For example, a physical chair you might be sitting on while reading this would be an object, of which you have an idea in your head. How do you know it’s a chair? Because it matches a list of characteristics that are encapsulated in the concept of a chair as you perceive it. And, of course, the word “chair” itself is the label you apply to it, or the designation.

If any of that explanation sounded confusing or required more than one read-through, you are not alone. One of my discoveries in this course was that the field of terminology is, by its nature, self-referential. In a twisted sense, that means when you think about terminology, you’re thinking about thinking. In other words, from the perspective of a terminology manager, terminology deals a lot with meta-data (data about data). Because of this inherent redundancy, it can seem migraine-inducing when trying to classify or organize terms.

But mastering knowledge of the “Triangle of Reference” is the basis of terminology, and is, therefore, the figurative gate to heaven. But it can also serve as a good example of visual organization, which is a concept that is heavily relied upon in modern terminology management. In fact, if you don’t visually organize data in this field, it can seem almost impossible to keep track of every aspect of a collection of terms, or term base. I have come to appreciate the value of a “concept map” because of its visual nature, and my experiences in this course have shown me that learning how to organize information visually is an indispensable skill. Here is an example:

Credit: World Intellectual Property Organization (WIPO)

Above is a cut-away from the WIPO Pearl concept map in their photography subdomain. This visualization is notably easier on the eyes than a traditional term base, and its interactivity is also a bonus, enhancing its usefulness as a supplement to a classic term base. Imagine the process involved in navigating this information in a term management tool: lots of clicking and navigating through dialog boxes and menus. It would waste time and mental effort. With the visualization of this concept map, you can immediately determine relationships between terms and make your own judgments as to how a term fits in a subject field.

And this brings me to my final take-away from this course: the link between perception and definition. Earlier in the chair explanation, I mentioned the concept of a chair “as you perceive it”. Perception is the driving force in what concepts mean to us as human beings. And, due to our nature, everyone perceives a concept in their own way, because we all have our own lists of characteristics that we assign to concepts. Terminology, however, seems to contradict this idea, because terminology work is the effort to classify and organize terms so as to give them precise unequivocal meanings. And this is where the importance of context must be emphasized. When defining terms, context is king. It is the ultimate determiner to a complete concept. Without it, anyone perceiving the terms would have to determine the meaning on their own. Context allows terms to be defined precisely, while allowing human perceptions to retain their unique nature.

In conclusion, terminology work is a great balancing act between the three points of the Triangle of Reference, between visual and textual organization, and between human perception and rigid definitions. If you’d like to discuss or share your own thoughts or experiences on terminology, please feel free to get in touch with me via my Contact page.

WIPO. WIPO Pearl Concept Map Search, https://www.wipo.int/wipopearl/search/conceptMapSearch.html. Accessed May 16, 2019

A Little Experiment in Gaming L10n with Unity

FPS Microgame

During my time discovering the finer nuances of software and games localization at MIIS, certain aspects of the process have enlightened me to practical strategies and techniques that could be leveraged for efficiency and process optimization. They are, namely, baked-in localization support and internationalization practices. Implementing or taking advantage of them requires a technical skillset that is becoming increasingly necessary in a PM’s repertoire and not merely reserved for the engineering team. I have often heard that localization project managers must wear many hats, because the localization process encompasses so many different types of tasks, but when software and video games are involved, computer scientist seems to be an unavoidable one. 

Two of my colleagues, Nathaniel Bybee and Rebecca Guttentag, and I undertook the task of localizing a brief demo of “FPS Microgame”, a game developed using Unity. And, if it wasn’t already obvious, the nature of the task necessitated that we wear our own computer science hats. But along with our new roles came unique challenges requiring solutions that certainly took us out of our comfort zones. What follows are some of the steps we took to overcome those challenges in our attempts to track down and retrieve strings for translation.

From the beginning, we were able to utilize a pre-existing baked-in localization method known as “I2” [read: “eye” two] with Unity which significantly aided our efforts to track down strings. In fact, the very first strings, the title screen text and opening menu buttons for example, were easily retrieved for localization using this method. I2 works by allowing developers to attach a localization “component” to an object in Unity. This component contains all of the options and attributes necessary for smooth localization by targeting the strings and providing the localized versions to be stored in it. It even allows developers to localize images (with DTP in my case) and associate the localized versions to their respective languages so that when, say a language selector is used, the correct image is loaded in when switching languages. With source files available for easy access to strings, this entire process could easily be streamlined and become highly efficient. Since I didn’t have access to the source files, however, I was forced to do the usual work of recreating the text over a mask so that it would be editable and, therefore, detectable by a TMS. You can see my work below.

Original Image
Character styles were rampant in this image. I created upwards of twenty.
The Japanese and German localized versions overlaid. Layout was adjusted for text expansion/contraction.

The major challenge we encountered after the initial strings and images, unexpectedly, was string concatenation. Anyone who’s done software or game localization might cringe at that term for good reason. It’s essentially a nightmare for localization and represents bad internationalization practices. If you look at the image below, you should see two strings with a logical coding statement between them. The nature of this particular block of code causes the entire string to be formed only when the code runs, but it’s impossible to translate accurately as is. 

Fortunately, however, with a little bit of recoding, I was able to structure things so that the two potential strings this statement could make were separated and made into whole strings that could be wrapped in our I2 localization method. This allowed them to be stored in our list of strings and sent to translation with everything else.

The significance of this issue lay in internationalization best practices. Imagine a situation where the PM doesn’t know how to code and then they have to ask an engineer to step in and help. That’s wasted time that no one needs. Thus, internationalization became a critical issue in this case (and there were certainly far more that we didn’t get the opportunity to examine). It’s my hope that issues such as these can be evangelized to developers so localization processes might be optimized for software and gaming localization for the future.

Game Global Summit 2019

Prior to the Game Global Summit this year, I was ambivalent about specializing in any particular niche of localization. Although the industry itself is defined by the myriad of niches carved into it, I had yet to discover my own. My intent was to simply go where the industry took me. But video games have always held a special place in my heart, and for once, my passions in life intersected in a way that was seemingly inevitable. The video game industry has had a, shall we say, shady history with localization, but the twenty-first century has seen some incredible improvements to l10n workflows for game developers and, if Game Global were to serve as a sign, the future looks to be bright for both industries.

So here is a little summary of Game Global Summit to give you a taste of what I experienced and perhaps encourage more professionals to give it the attention it deserves. Although it was dwarfed by the monolithic LocWorld41 conference in the days following it, I would be remiss to deny its undeniable success even in the wake of some last-minute schedule adjustments. The discussions were insightful, productive and fascinating. And the professionals who presented had some fascinating bits of info that all deserve mention here.

On the first day was Glory Chan-Yang Choe’s presentation over voice-over (VO) localization. Hearing her recount her tale of virtually single-handedly running a successful VO localization program from start to finish while ensuring top-notch quality was awe-inspiring to say the least. Then there was George Tong’s insightful presentation over testing for compliance with China’s regulations when localizing games: a helpful reminder of what it means to go global and the restrictions and laws of which companies should be mindful.

The second day witnessed Virginia Boyero and Patrick Görtjes’ stellar presentation on the l10n workflow used for Massive Games’ “The Division 2”. The customizations they presented alongside the proprietary software that was used made for a truly impressive presentation overall (in-game animations and videos notwithstanding!). Lastly was Miguel Bernal-Merino and Teddy Bengtsson’s inclusive discussion about language variance, how it affects l10n projects and what it means for the industry.

I’d also like to give special mention to the panel speakers during the discussion about culture in the video game and l10n industries. Michaela Bartelt, Kate Edwards and Miguel Bernal-Merino led an incredibly insightful and open discussion about how a multi-cultural world has impacted the industries and what we might look forward to in the future. It was, in my opinion, the highlight of the conference. I’d also like to give a shoutout to the folks from Keyword Studios for sponsoring the event as well as María Ramos Merino for running the event. Thank you very much for your efforts in making this event a reality. I’m greatly looking forward to returning again next year!

My CAT Tool Experiences: Spring 2019

With the rapidly growing diversity of CAT tools in this era of Machine Translation (MT) and AI development, keeping up with the latest trends can be a daunting task. However, during my stay at the Middlebury Institute of International Studies (MIIS), I had gotten some exposure to translation engine training software, various CAT tools, as well as how to utilize Regular Expressions to streamline the localization process. In this post, I’d like to walk you through a few of the projects I completed during the Spring semester of 2019.

Let’s first focus on translation engine training for a moment. Statistical Machine Translation (SMT) and, even more so, Neural Machine Translation (NMT) seem to be hot topics in the language industries. Although their prevalence is somewhat controversial, they are nevertheless an important consideration for any stakeholders. And so, as part of our up-to-date curriculum at MIIS, we engaged in a small project whereby we trained an SMT engine using Microsoft Custom Translator, ran an evaluation on our results, and drafted a proposal to a “client” with our recommendations for potential further training.

Our goal was to train an engine to effectively machine translate TEDTalks speeches covering marine biology and oceanography. To accomplish this, we compiled a series of “training”, “tuning”, and “testing” bitext files in an attempt to improve the BLEU score of the engine. This score is currently the industry standard, but has several disadvantages, which I won’t get into here for the sake of brevity, but, suffice it to say, for the purposes of our project, increases in the BLEU score meant positive results for our engine.

The “testing” files were general subject speeches pulled from a corpus of TEDTalks speeches from 2013. These gave our engine a basis with which to build its translation rules from. The “tuning” files, then, would be closer in domain to our desired results. For those, we used ocean science-centric speeches. The “testing” files would be used to sample the quality of the engine’s output after the training had concluded. To use them for our engine, we aligned them using the online CAT tool Memsource.

After training the engine, we performed a final evaluation of the quality of MT output. Compared to our initial model’s MT output, there wasn’t a significant change. That being said, there were noticeable time and cost savings that we calculated over human translation for our engine, but the MT quality was far too underwhelming to justify further training, especially considering the history of our BLEU score (which peaked on the second of ten models). You can find our presentation and proposal in the link below.

https://drive.google.com/open?id=1CbNSlgMeW-6OlPNOKdQ3cV9j3T4Ieiou

In addition to learning to train an SMT engine, we also received an arguably indispensable tool for localization: Regular Expressions (Regex). After coming up with some custom rules for my language of study (German), I got a taste of the potential for these powerful rules. Here’s an example:

Considering the potential complexity of Regex, this is an incredibly simplistic, yet effective, rule. In short, “die” is an article in German that cannot appear in the Dative case (feel free to look up German cases on Wikipedia, if you dare), so this rule looks for any instance of “die” following the preposition “mit”, which always precedes words in the Dative. This is a fast and effective way to find a potential error to correct it. Another small example might be to find dates in a particular format using Regex, and replacing them with the correct format. For localization, Regex can be a powerful tool to detecting and correcting errors, thereby speeding up editing and proofreading. Process optimization is always a major concern for project managers, and Regex is perfectly tailored, it seems, to aid that. I have a few screenshots of some other rules I wrote linked below, for your curiosity.

https://drive.google.com/open?id=1dchG2byzepoBfDdF38v4TuMa1pGGmkfb

And, finally, I also demoed an online CAT tool called matecat, and created a video walking through it. Testing out new CAT tools is a fantastic way to broaden industry knowledge, and so I am more than happy to provide at least one small corner of that industry. The video is linked below in google drive.

My Matecat demo

If you have any questions or suggestions, please feel free to send me a message via my Contact page.

Localizing Pokémon™ Promotional Graphics in Photoshop

My experience with desktop publishing (DTP) so far has been quite enlightening, and in the wake of a rapidly expanding demand for media localization, I recognize the increasing need for DTP skillsets in the industry. To that end, I’d like to share a little project I completed while attending MIIS.

Image Localization

The premise was simple: localize a couple of promotional images into German and Korean from English using Adobe Photoshop. So using the skills and tools learned in class, I obtained two images from an email from Nintendo about Pokémon™: Let’s Go Pikachu and Pokémon™: Let’s Go Eevee and proceeded through a few simple steps.

Image localization is straightforward conceptually, but tedious and often complex in practice. The basis is that, since you are dealing with images, you can’t edit the text, which is a problem since you need to translate the text. You might resort to OCR (Optical Character Recognition), but that wouldn’t get rid of the old text or deal with the background. Thus, the only practical approach to this conundrum is to create a mask layer to hide the old text, then create new text on top of the mask that closely resembles or exactly matches the original text. Voila! You have editable (and, more importantly, translatable) text!

Credit: Nintendo
Credit: Nintendo

The Mask

For the first image, most of the challenge lay in the creation of the so-called “mask” layer. Using Photoshop’s “Content Aware Fill” function allows an easy method to get rid of the text, but, because of the unique patterned background, distortion was inevitable. Thus, I was forced to recreate parts of the pattern to clean it up, as seen below.

The remainder of the work involved cleaning up fragments of text that the Fill function missed. This demonstrates the unfortunate limitations of the function, but it should be noted that the vast majority of the text was filled in, requiring minimal post-work.

Remaining Text Fragments

This was a simple matter of using the brush tool. It could be argued that this could have been avoided with careful use of the magic wand tool, but after multiple adjustments and approaches, I was forced to clean up the text manually. For the sake of process improvement, I would probably resort to manual cleanup immediately after applying the magic wand tool to hasten the overall process in the future, rather than tediously finding a way to get it all in one go with the Magic Wand.

The Font

Once I cleaned everything up, I blended all of the adjustment layers into a single mask layer, effectively removing the text from the image. Then, of course, came the replacement text. To translate the text, it needed to be editable so I created a few text boxes, and did a deep dive into the internet to find the font used in the original image. Unfortunately, I came up empty-handed, and I surmised that the font was a proprietary one used and owned by Nintendo. So, instead, I found a font closely reminiscent of the original in PS’s font library and used that in its place.

Once you have found an appropriate font, matching the text involves a series of adjustments to some important settings including leading, kerning, vertical and horizontal stretch of the characters, and, of course, font size. You’ll also need to tinker with paragraph settings as well, such as alignment, indentation, etc. This was probably the most tedious part of the project. I had to align the text just right by tinkering with the above mentioned settings.

The second image with the green gradient in the background was more straightforward and more easily demonstrated the usefulness of the “Content Aware Fill” function in PS.

If you compare this image to the original above, you’ll notice subtle changes in the font, as was the case with the first image. At this point both images had a mask to hide the original text, and text boxes with properly adjusted character and paragraph settings, which led to the final stage.

Translation

Believe it or not, this was the easy part. I utilized the CAT (Computer-Aided Translation) tool Memsource to localize both files. I selected German and Korean as target languages to test this process, and both came with their own unique challenges.

German

German typically expands by 30% when compared to English. That is to say that it takes 30% more words to say the same thing in German as it does in English. Unfortunately, this meant that text in the images would extend past the ends of the text boxes I created or become misaligned once translated. To combat this issue, I either resized some of the text boxes, or I opted for simpler translation to cut down on the word count/length. You can see the results below:

Korean

The Korean versions of the images were a bit simpler to deal with. Korean generally shrinks compared to English, as opposed to expanding like German. Nonetheless, I ended up resizing some text boxes to make the text appear to fit better. The real issue was using a Korean-friendly font whose style at least vaguely matched the English source text. You can see my choice below. But bear in mind, that I used unedited machine translation to get this text since I didn’t have a Korean translator available. So in case you can read Korean, please pardon the nonsense you might find.

Conclusion

Once you’ve gotten the hang of creating masks and recreating text, you have mastered the essentials for image localization using PhotoShop. With the modern additions to PhotoShop’s array of tools, this process has become easier than ever. The Magic Wand and the “Content Aware Fill” function are both very powerful tools when used correctly. That, combined with many CAT tools’ present integration with file types including PSD, makes for a smoother experience for localizing images. However, be prepared for challenges such as patterned backgrounds and unique fonts. They may require a slightly creative approach with a little extra effort, but the payoff will be worth it.

If you have any questions, or would simply like to discuss localization or DTP, please feel free to reach out to me on my Contact page.

You’ll also find the files from this project in the google drive link below:

https://drive.google.com/open?id=1Feas1pd22FXP32sfAHbblsDLM7KaFokv

Experience with TMS

Translation Management Systems (TMS) can be found in every corner of the translation/localization industry. And they are all just as unique as they are ubiquitous since each language services company seems to have its own proprietary solution. There is still some prevalence, however, with regard to a select few. Granted, these select TMS’s don’t (and really can’t) possess a comprehensive list of all the features one is able to find across the industry, but they do feature the most robust collections of tools and functions out there. Because of this aspect, however, they are anything but simple. Nevertheless, I have taken a dive into their depths and returned alive. So let me show you my experiences with two of the largest and most complex TMS’s in the industry: WorldServer and GlobalLink.

The Comparison: WorldServer vs. GlobalLink

Let me start by admitting just how different reality has been from my expectations for these two systems. After an 8 hour long session with each system, I developed certain biases toward one over the other. This was largely due to aesthetic differences, but also the logical structure and naming conventions used. What was a “project” in one system was entirely different in the other. These fundamental differences in the logic of the system’s infrastructure may have been off-putting, but that shouldn’t cast any doubt on the robustness or utility of each system.

Our evaluations put those aspects to the test to see how they measured up. It turns out that both systems are pretty good at accomplishing what they were intended to do. My team and I came up with 4 categories of requirements, as depicted in the image above, that we believed our client (the LSP Babble-On in our simulation) would be in need of when adopting a new TMS. In each category of requirements, we assigned each requirement a level of value to the TMS: useful, important, and critical. The requirements were then given weights on a scale of 1 to 5 based on our evaluation through a small pilot project.

The pilot project was simply running a short text file through the localization process in both systems. The goal of the this project was to consult our “client” Babble-On in choosing a new TMS to transition to for their business needs and requirements. We did this by demonstrating the functionality of each TMS and showcasing our results in our evaluation tables and a presentation.

The strongest differences we found were in the project preparation phase. Where GlobalLink suffered from a lack of intuitive design, WorldServer didn’t seem as friendly in its workbench design for translators. With proper training, however, we found that both systems served their functions well and reliably.

That being said, WorldServer had a more intuitively laid out system for project preparation. You simply need a set of prerequisite items in place, before you can create a project. Once a project is created, all of the necessary pieces are already in place thanks to the design, which is a reflection of good localization project practice: high front-end investment and preparation for a smooth back-end and delivery.


GlobalLink, on the other hand, had a slightly more complex project setup phase. You could create a project without all the necessary equivalent items required in WorldServer, but you would have to manually edit them in later. The key with GlobalLink is training materials. If you have access to tutorial videos or guides, GlobalLink shines. It has a wealth of tools that allows you to run a project with efficiency and security. It even has a nifty “impersonation” feature that lets you, as an admin, log in as a different kind of user to see their view and whether or not they have access to a project that’s been assigned to them.

In the end, both GlobalLink and WorldServer are solid choices for a TMS. WorldServer can be slow and clunky at times, but it also has a very intuitive design for its purpose. GlobalLink suffers from a lack of intuitive design in many instances, but when you can get proper training for it, it excels at handling the localization process. Both, however, are top considerations if you lack the option to create your own proprietary TMS. Just be prepared for some extensive training time. Our recommendation for our “client”, then, stemmed from this assessment. If you have access to proper training materials, GlobalLink was our choice by only a narrow margin. Should that fail, for whatever reason, WorldServer would make a very reliable second choice.

Below you’ll find a link to our presentation and evaluation files. If you have any questions about this project or simply want to share your experiences with WorldServer, GlobalLink, or any other TMS, please feel free to send me a message on my Contact page.

https://drive.google.com/open?id=1K6ZBa8lLEdQy4KP9eZ-zOpNJ19XPXQQ7

As a postscript, I’d like to mention one other TMS project that I engaged in during my time at MIIS. This was a consulting project for a client that our professor introduced to us. After being guided through a demo of his proprietary software, my teammates and I assessed some features pertaining to QA and compared them to WorldServer, GlobalLink, and Lingotek.

We utilized a simple scoring system that consisted of issuing a score from 1 to 5 for a single feature and then summing up the total score. And while one might question the use of such a simplistic approach, it should be noted that the data still yielded some interesting and significant results. The TMS whose features we evaluated scored noticeably lower than its competitors overall. And so my team and I issued some important recommendations to the client to help him potentially improve the system. You can view those recommendations in the presentation file linked below. As always, if you have any questions, please use the Contact page to get in touch.

https://drive.google.com/open?id=1xDbQFfPxgnjOuhF_wv9JuSgen17KeVce

Localization Project Management

During the course of the Fall 2018 semester at the Middlebury Institute of International Studies, a few of my colleagues and I were responsible for developing a Project Management Office to simulate a localization project for a client. The goal of the project was to experience a real life localization project from beginning to end in order to experience the responsibilities and tasks of a project manager throughout the course of the project.

As part of these tasks, we created a virtual office for the project including a workflow created with Trello (seen below) as well as an online repository for all pertinent company, client, and project information through DokuWiki.

Trello Workflow:

DokuWiki:

Our company “Sea L10n Enterprises” chose the Monterey Bay Aquarium as a client for the project. The client’s home page had already been localized into Spanish, so we made it our mission to localized the Spanish homepage into six additional languages: Japanese, Korean, Mandarin Chinese, Portuguese, German, and French.

In order to accomplish this, we had to organize the structure of our virtual office first, the end result of which can be seen in the DokuWiki sitemap pictured above, so that we could ensure a smooth project.

As project managers, we learned a variety of invaluable lessons along the way to make the project proceed without major setbacks. The first of these lessons was the importance of standards associated with the processes of localization and project management. For example, ISO standards 21500, 10006, 9001, among others, are all crucial to how companies proceed through localization projects while maintaining strict ethical practices.

The next major lesson learned was the need for heavy front-end investment in time and cost to a project. Devoting a significant number of hours to client communication, project preparation, talent recruitment, and project specifications will ideally allow the back-end of the project to be quick, straightforward, and smooth. As you might have noticed from our Trello workflow, the project was divided into three major segments: Pre-Production, Production, and Post-Production. The Pre-Production card contains up to two or three times as much content and checklists as the other two segments!

Of course, it is also necessary to mention how we tracked our time throughout the project. We used an online time tracker called “Toptracker”. The importance of time tracking and staying within budget simply cannot be overstated.

We learned early on in the project that we would need to be exhaustively specific when tracking time on our individual tasks lest we risk going over budget. And, for that matter, we also learned that we needed a good understanding of the entire flow of the project from start to finish in order to accurately estimate a budget and provide a quote to the client. Time, in the context of a localization project, then, is perhaps one of the most precious resources that a project manager must work with. And after working through our semester-long localization project, we got a good sense of how our time was spent.

It was through our analysis of the Toptracker time report, that we could determine how better to spend our time managing a project. Which leads to the last major lesson: QA and the Post-Mortem evaluation. Both QA and the Post-Mortem are crucial to understanding what, if anything, went wrong and how to prevent it from happening again or how to improve processes and practices to make it better. QA is an ongoing process and is equally as important as project preparation in ensuring a project completed within a designated timeframe and budget. The Post-Mortem, on the other hand, is a comprehensive look at everything that occurred within the project and an analysis of how the entire project flow could be improved.

Thus, our Post-Mortem analysis revealed a multitude of ways in which we could improve while also showing the extent of our work throughout the semester.

As a final note, here are the final deliverables we produced (via Google Drive: Copy the link and paste into your browser to download):

https://drive.google.com/open?id=1BSd4Th65hhBLi9u-0cvlHDkyhNN0HDct

 

Introduction to Computer Aided Translation (CAT)

As an aspiring professional, real-life experience is an invaluable and necessary part of my personal growth. I would like to present the work that my colleagues and I completed at the end of our Fall 2018 semester that reflects that growth. This project essentially consolidated everything we learned in our CAT class and put it to practice, forcing us to make decisions about how to approach and conduct the localization process from beginning to end.

First thing’s first, our client: Blablacar is a German ride-sharing company established in Europe that my group selected. The ultimate goal was to localize the “history” and “about us” pages into US English so that Blablacar could theoretically begin expanding into the US market.

For the first part of our project, we were responsible for coming up with a statement of work as part of an official project proposal. To that end, we drafted up the following document:

In order to make the project readable for the client, we created a simple timeline/flowchart to illustrate how the processes would proceed without confusing the client. We did learn a few lessons from the kickoff meeting with the client. The primary lesson learned, however, was to discuss client preferences during the kickoff meeting and to plan to sort through them at the beginning of the project, if approved.

We also developed a list of potentially problematic words for the translation process that we brought up to make the client aware of any pitfalls that might occur as a result. (e.g. words such as Nutzerführung, Mitfahrzentrale, Gesprächsfreudigkeit, etc.)

One other important takeaway was the need for brevity in our documents. Including as much information in as little space as possible while maintaining clarity was a significant factor in our meeting with the client.

After client approval, then, we could proceed with the actual localization. As part of our strategy to complete the project, we divided the approximately 600 words evenly between us and translated the documents using the translation management tool Memsource. This process naturally involved the creation of a termbase and translation memory.

Also included was the pseudo translation which, as we have learned, is a crucial part of the translation process to identify problems before beginning translation.

But there were some unique challenges to this setup. We had to translate each of our sections separately, and, as a result, we had to collaborate afterward to ensure that our final translation was consistent in style and tone. For the small scope of our project this approach was acceptable and beneficial, even, since it allowed us a glimpse at the habits and preferences in the process of translation of our teammates.

Some of our work:

But in the context of a larger project, this approach would be ill-advised if not downright messy when producing the deliverables for the client. However, at the end of the project, we were able to consolidate our translations effectively and produce a quality translation suitable for the debut of a European website to an American market.

For our deliverables, we included the translation memory, termbase file, source text in a word document file format, target text in the same format, and our pseudo-translated file for future reference. Ultimately, this project demonstrated our ability to communicate with clients throughout the proposal process for a project as well as our ability to negotiate with the client to determine preferences and key details to allow the project to be smooth and without major setbacks.

It also demonstrated our ability to coordinate our efforts and function as a localization team.

Here is a link on Google Drive to our final deliverables:

https://drive.google.com/open?id=1I6VVUoVjBUEvB96IvEwEQqzmvyh-LFvn

 

And, of course, here is the video presentation of our lessons learned:

https://drive.google.com/open?id=11u5X0toYJ4v-8HgejGjPQ7Avuqz5h7gf

Website Localization Project

As a final project for the Fall 2018 semester at the Middlebury Institute of International Studies, two of my colleagues and I localized a JavaScript version of Pacman from English into three different languages (Arabic, French, and German). We leveraged the skills and methods taught to us throughout the semester to complete this project in a timely fashion. Here are the results:

The process of localizing a game or application that runs on JavaScript is a fairly straightforward, if somewhat tedious, process, but please allow me to walk through it step by step.

My colleagues and I utilized the “24 ways” method which involves three distinct JavaScript files being created. The first file is called “24Ways.js”:

Very simple and concise. It essentially checks to see if a translated version of a string is available, and, if it is, replaces the source text (ST) string with the target text (TT) string during run-time. It is the crux of this method, but, ironically, requires the least amount of maintenance among the files used for this method.

The next part required the majority of the work needed to bring the project to completion. In this step, we gathered all of the strings in the JavaScript files that players of the game will encounter during normal gameplay. We copied the strings into the simply named “Strings.js”.

This array of strings is passed to the “_” function in the “24Ways.js” file so they can be translated. They are divided into two sets (purple and orange), and the rightmost set (orange) needed to be translated. Normally, the text in this file would be copied into a word document or .txt file, but since the scope of the project was small, we manually translated the orange strings into our three languages. But to translate the strings using the normal method, after copying the strings to a text file or word document, all the characters other than the strings to be translated would be toggled to “hidden”. Then, when the document is exported to a translation management tool (e.g. Trados Studio, Memsource, etc.), the text is directly translated, and everything (including the hidden characters) can be copied and pasted back into a JavaScript file.

With the “Strings.js” file in English, we added the target language copies to the game, “Strings_DE.js”, “Strings_FR.js”, and “Strings_AR.js”. (The source and target text versions of these files were the second and third of the three distinct JavaScript files mentioned earlier.)

With our 24Ways and Strings files in place, it was simply a matter of calling the “_” function for every string in the JavaScript files that players of the game would encounter. (e.g. _(“example text”) )

In many cases, this process required splitting up strings that were written in HTML code that ran during runtime, as was the case with the screenshot above.

The only remaining text requiring translation, then, was anything in the main HTML file “index.html”. Thus we translated the file and made target language copies. (“index_DE.html”, index_FR.html”, “index_AR.html”) So when a player launches the any of the translated HTML files, the game is entirely localized into the language of the “index” HTML file they chose.

At the end of this process, for the sake of usability and efficiency, we added a language selector to the game, so that players can freely switch between languages during runtime. It should be noted, here, that we also made some minor changes to the primary CSS file to accommodate some issues with the target languages. This is a common problem when localizing into foreign languages, but simple to fix.

Ultimately, this entire process was simple and straightforward, but it consolidated the skills we learned in our localization courses and put them to the test. Here is the link to the original game (this game does not belong to me):

http://pacman.platzh1rsch.ch

And here are some screenshots of our work:

 

And, lastly, here is a link that you can copy and paste into a browser for the downloadable version of the localized game on Google Drive:

https://drive.google.com/open?id=1XK5a_GmkzOmOBhNi7yUM8fo-82dtDl6_