It’s no mystery that automation is now essential to a localization workflow that can keep pace with today’s gargantuan influx of source content. And in an era where working from home is now a requirement, automation can be a crucial tool to mitigate the stress of managing a workflow remotely. And among the automated solutions in the industry today, there seems to be a gap for Python solutions. In the spirit of these ideas, two of my colleagues and I collaborated in our spare time to develop and test a short, but effective, python script designed to pull strings straight from HTML files and put them in a document for translation. With a command from the command-line, a translation-ready source document is created alongside a javascript file that, together, form a localization solution for small websites. I’ll breakdown our script piece by piece and showcase the methods we utilized.
Let’s start with the most important piece of the pie: Beautiful Soup. Beautiful Soup is a Python library built specifically to enable and facilitate the access and modification of HTML or XML parse trees. The so-called Document Object Model (DOM) that HTML employs in its basic structure is baked right into Beautiful Soup, making it ideal for searching through. Our initial research uncovered this gem early on, and we hope to spread its popularity through evangelization. It’s not an exaggeration to say that this library is the crux of our script. It provided us the essential tools necessary to extract strings of text from HTML docs.
In particular, the standard (yet so much more than “standard”) “get_text” function allowed us to retrieve every string and place them into a text file. With just a simple command, a translation ready document is generated. However, this by itself is not the real beauty of this script. It’s the auto-generated javascript that truly makes this shine. In fact, this portion of our script really showcases the power of Python with its ability to auto-generate script files. For our pilot example, we generated a file for a solution titled “24 Ways” that serves as a convenient centralized localization solution for small and/or static websites. The core premise is that a javascript file contains all the strings from the original document in pairs with their translated counterparts (the key or source string, and value or target string). And each language has a key/value pair “Strings.js” file. The pairs of strings are governed by a function that is placed around each individual string in its original file so that when the function is called, the original string is passed to the key/value file to match it with its correct translation. The translation then dynamically replaces the original source string. This is important primarily for strings that appear in javascript elements in HTML files (which is particularly prevalent in browser games).
In essence, then, our script not only provides a document containing the text from the HTML document to be translated, but also generates a key/value pair “Strings.js” file to be translated (NOTE: to prepare a “Strings.js” file for translation, it should be converted ideally to a Word document and have all text hidden except for the value strings). Thus, this solution includes a separate javascript and HTML file for each language. Modifying or updating strings becomes much less of a chore when all of the strings are stored discreetly in these files. But adding new strings will require additional functionality and further tweaking of our script.
The point of this script is twofold: to showcase the power and efficiency of using Python in a localization workflow; and to form the root of a whole range of Python scripts to be developed for localization to fill in important gaps in the industry. The next projects up for consideration include a script for maintaining an existing localization solution automatically and a script focusing on internationalization issues in javascript or typescript (a broader version of javascript, essentially).
Below is a link to view and/or download the full script:
https://drive.google.com/file/d/13iZKh43jElcHOVnCNoRO0PIvbFdQbwEM/view?usp=sharing