Read about the most challenging project I have ever done and learn from the mistakes of others.The below is an interesting case scenario for aspiring developers and localizers. During this project, the hardest I have ever done, every single localization rule was broken, resulting in a less than optimal product. This story shows two things: first, it teaches you how you should definitely not organize your project. Second, it shows you that even in a worst-case scenario, I can work miracles (considering the tools given).
John and Bill go Fishing
As I obviously can't give you the names of the actual products and people involved, I have chosen the above working title for this project. It was a huge title everybody had been looking forward to, to be published by a famous publisher on an even more famous console. But first, some background info.
Basically there are two different kinds of tools we can use for translation: machine translation tools and computer-assisted translation tools. The former analyzes the source language (in this case English) on a word level and then applies rules of grammar to come up with something that resembles a translation. Obviously computers are not very good at this, as language involves a crapload of fuzzy logic. This is why prudent translators disqualify these tools immediately (translate a long sentence to French and back to English with Google Translate to see why). [EDIT: Note that this was written before the advent of neural networks]
Computer-assisted translation tools however have become almost a must these days. Basically these tools store all strings you have ever translated and leverage them to come up with proposals whenever you encounter similar strings. A CAT tool for example might say:
Listen, I see you are now translating the string 'John is fishing'. However, on May 5th 2009 21:38 you already translated the sentence 'Bill is fishing' as 'Bill is aan het vissen'. Shall I replace the word 'Bill' with 'John' and use “John is aan het vissen”?
For this kind of logic, it is important that the string in question is complete: the less complete it is, the less reliable the tool's suggestions are. This is because segments in sentences depend on eachother: if I change a word in a sentence, chances are that other parts in the same sentence will change too. For example, if I change 'Bill goes fishing' to 'We goes fishing', obviously 'goes' needs to be changed into 'go' too.
For the CAT tool to come up with intelligent suggestions, it is important that both 'Bill' and 'goes' are in the same string. Here is why:
As you can see, scenario 2 results in less false positives (in reality it will result in far less false positives).
Now in these examples it may seem that CAT tools only save you very little time, but if you imagine that games can easily contain 50,000 to 250,000 words, you probably understand that CAT tools have a lot to offer when it comes to efficiency, time-saving and money-saving. I say money-saving, because most translators give you a discount every time CAT tools spot a repetition in the text (a so-called match).
For this to work, all strings must be complete. A second important rule is that the strings should be clean: code should be separated from text. Developers will immediately remember the Model-View-Controller (MVC), a programming model that says that data, interfaces and logic should be separated from eachother. It works the same way for translations: the text is the model and the tags which don't need translation are the controller. They should be separated. Here is why:
Now, if you use a well-known format, like HTML, CAT tools can automatically filter the tags, remember their positions, and only process the text between them. This way the dirty HTML text will still be treated as clean plain text, so that you can leverage repetitions in the text. However, the more obscure your format, the smaller the chance that standard filters exist. Now there are all kinds of ways to program filters for CAT tools, but this takes time, and therefore costs money. So, to recap: strings should be complete and strings should be as clean as possible. If the latter is not possible, you should at least use a common format.
The format from hell
Back to the story. The game I was working on was made in Japan, and therefore the logic used during development was Japanese too. Unfortunately this logic (it was bad logic) is very common in Japan; even worse, it's the only logic I have ever seen in Japanese products, which doesn't bode well.
(Deep down there I hope that one day, a Japanese developer will actually take the time to read this page and learn. And when I really get too much time on my hands, I may even consider making a Japanese version of this page. Until that time though, I'm afraid we'll have to accept the status quo and work with what we have.)
thejapaneselanguageusesnospaces,nordoesithavecapitalsorsmallletters.thismakesitreallyhardforcomputerstoseewherewordsstartandend. Most text in Japanese games is therefore not wrapped automatically, as this may result in words getting wrapped in the midwhoops enter
dle, which looks ugly. Instead, Japanese developers have found a brilliant solution for this: they wrap every single line in their games manually.
That is correct. I am not kidding. If a game contains 250,000 words and the average string (sentence) consists of 7 words, 250,000/7=35,714 hard enters are inserted by the developers manually, to make sure that words do not get wrapped in the middle. As the text has to be wrapped manually anyway, the sizes of all dialogues are fixed, resulting in a fixed amount of permitted characters per line, and a fixed amount of permitted lines per dialogue.
happens if this[enter]
forgot to add the[enter]
word 'what' after[enter]
That is right: after the word what is inserted, all lines will need to be rewrapped, manually of course. Else the first line will become too long and exceed the maximum line length. This then causes a chain reaction, as you are not allowed to exceed the maximum line length of the next line either when you take the last word(s) of the previous line to the beginning of the next line. Rinse and repeat for every single change in the text. And believe me, developers change many things in the text during development.
Now, this is already bad enough as it is for the Japanese developers themselves (save their souls), but due to the whole logic and structure behind this, the same logic is carried over to the languages to which the games are translated. So even though languages like Dutch contain spaces, so that text can be wrapped automatically on the fly, the Japanese developers do not realize this. Their games, therefore, cannot process translations that are not hardcoded. That is, the translations must be wrapped manually too, as that is what the software expects. Even worse: because the dialogues have a fixed size, the translations also need to adhere to length restrictions: both for the maximum number of characters per line and the maximum number of lines per dialogue.
I have told Japanese developers time and time again how stupid this way of working is for western languages, but they simple won't listen to me. So here's hoping that someone with more influence will read this.
Anyway, please look closely at the text Now imagine above. Do these look like complete strings to you? They certainly are not. This means exit CAT tools, exit money-saving and exit time-saving.
Basically, the format from hell was an Excel file with countless tabs (for easy navigation - not), each tab containing countless cells, each cell containing a manually wrapped dialogue with hard enters. Besides this dialogue were two numbers: one indicating the allowed number of characters per line and one indicating the allowed number of lines per cell.
Fortunately I do have some programming experience. I figured that if the Japanese developers were not willing to solve the problem, I should solve it myself. So I made myself a PHP program that remembers three things for each string: the number of allowed characters per line, the number of allowed lines per string and of course the string itself. It then strips all hard enters in each string, resulting in something like this:
Now imagine what happens if this small dialogue needs editing because someone forgot to add the word 'what' after 'imagine'.
This information was exported to an XML file, which could subsequently be imported into my favourite CAT tool. I did this in such a way that the character/line information stayed hidden (they were moved to different XML tags), so that only the clean text was displayed. The XML format looked a bit like this:[donottranslate]34*8[donottranslate]
[translate]Now imagine what happens if this small dialogue needs editing because someone forgot to add the word 'what' after 'imagine'.[/translate]
I simply told the CAT tool to only import the translate tags. When I was done, the English text between these tags was replaced with my Dutch translation. The text then went back to my software, which analyzed the XML file to see how each string should be rewrapped. For the above string, it would try to wrap my (now Dutch) string over 8 lines with a maximum of 34 characters per line. If that didn't work because (a word in) my translation was too long, the software would warn me.
To make a long story short: I wrote software that enabled my CAT tool to process hardcoded strings like they were softcoded strings, rewrapping my translations on the fly based on the wrapping information for the English source text. This enabled me to leverage previous translations even though hard enters in these previous translations might have been on entirely different positions.
Things get worse
To make things worse, the strings were also colored. Not just colored, no, the developers had defined dozens of keywords (movements, item names, character names, et cetera) and decided that each of these should get a different color in the text (which was delivered in Excel). See the screenshot below: I have resized the picture so that you can't tell which game it is from, but you should be able to see the different colors that are assigned to every word.
As 1. you need to know Visual Basic to convert font color information from Excel to XML (I don't know Visual Basic), 2. introducing color tags in the source text would greatly complicate the wrapping process (you need the tags in the text itself but they do not count towards the length restrictions) and 3. introducing color tags in the source text would make the source text more "dirty" resulting in less leverage, I decided to make a list of keywords instead and color them after the translation had been exported back to Excel. I could then use a search and replace program in Visual Basic which I had found on the net to automate at least 90% of the process. The remaining 10% (keywords that got agglutinated or inflected for whatever reason) I would do manually.
So far so good. I had finally found a way to process these files (without getting paid anything extra or whatsoever) and could start with the translation.
One of the reasons why I invested so much time in this format was the fact that I really wanted to put this game on my resume. It was, after all, a really big title and listing myself as the translator of this title would make a very good impression. I had also freed up my entire schedule, refusing other projects and even sacrificing weekends, to make sure I would be able to adhere to the end client's very strict schedule.
Too much leniency
Until my client, which in this case was a translation agency sitting between me and the end client, told me that part of the project would go to someone else. The reason? No, they didn't hate me. No, they didn't have any problems with my style. No, I had always delivered on time.
So... what was it?
They had actually promised another translator that he would get part of the project too, and they felt sorry for him.
So much for separating business from private. Even though I was perfectly able to cope with the client's deadlines, the client insisted that part of the project go to someone else, not only ensuring that the game would be translated by two different translators with two different writing styles, but also making it impossible to match terminology, as the other translator obviously had not written his own format-from-hell-filter and therefore could not use CAT tools (that keep track of terminology) at all.
Instead, this translator had to revert to the old manual way, looking up every potential term (anything that resembled an item name, a weapon name, a character name, et cetera) in the translations I had done so far, manually of course. If you realize that a 25-page manual can contain 2,500 unique potential terms, you probably understand that this working method is very slow and very prone to errors. I mean, when you use a CAT tool, you add terminology like item names to a separate terminology database on the fly, so that next time this word (or something that resembles it) occurs in a sentence, the translation you used is displayed on your screen automatically.
And obviously I was not planning to give my colleague, who had now become a competitor, my proprietary software either. Not only would he not be able to operate or run it (it still needs to be debugged on the fly and it requires a PHP server), but by giving my own tools away I would actually start competing with myself. Though translating games is my passion, I am trying to run a business too.
|The wrapping software in action|
It takes two to tangoOnce I understood that I wouldn't be the only one working on this project and that therefore, I no longer had full control over the quality, I realized that I would never be able to list this project on my resume. Therefore, I decided to no longer invest any extra and unpaid time in it.
Until now I had put everything aside for this project. I had held off other clients and refused other projects, just to make sure that I would be able to fit in all unexpected batches of text that may or may not have arrived. But that time was over. The circumstances had forced me to treat this project like any other project. Therefore, I could no longer guarantee delivery dates for texts with unknown hand-off dates: if the client promised to send me 10,000 words on Monday so that he could get them back on Friday, he would actually have to book me and reserve my time, so that I wouldn't lose any money in case the texts did not arrive on Monday, leaving me empty-handed (which happens all the time).
However, the client wanted me to guarantee delivery dates anyway, while not guaranteeing the hand-off dates himself. But as it takes two to tango, I politely refused. This meant that the client could only outsource text to me if I had no other projects running at that moment, which was impossible to predict.
Meanwhile, I was no longer the only translator who had decided to refocus on other projects. The other translator seemed to have lost interest too, probably because of similar reasons (or maybe because he realized that translating and rewrapping text in a format like this for the normal rate was not the best way to get rich). Before we knew it, the project was done by dozens of translators at the same time, each using his own terminology and writing style. As several batches were being worked on by multiple translators simultaneously, none of whom used a CAT tool due to the format-from-hell, it was impossible to match each other's terminology, so that some items ended up having six different translations in the same game. The project had gone totally out of control.
|Part of the wrapper code, which has more than 1000 lines and is pretty complicated.|
More chaosComplaints started coming in from the console manufacturer (most in-game translations are reviewed by the console manufacturer before they can actually be published): terms did not adhere to their glossaries, the translation was a mix of Dutch and Flemish (apparently the agency had used translators from Belgium too), terms were inconsistent, the language was inconsistent and the style was inconsistent.
Also, the translations contained dozens of errors: due to the fact that all translators working on the project but me were wrapping the lines manually, dozens of spelling and grammar errors had been introduced, not to mention the fact that many lines did not even adhere to the line and length restrictions, which are very hard to keep track of if you don't automate the process with software, like I had done.
To make matters worse, no one knew anymore who had translated what, how much had been translated, how much still had to be done. Last but not least, the developer (the end client) had added new strings and deleted old strings in the same version of the Excel file that was being translated by the translators at that moment, so that by the time the translations were ready, no one knew in which parts of the (meanwhile updated) Excel file the translations had to be pasted back.
The developer had tried (in vain) to color-code all cells in Excel: using different background colors for cells for which the Japanese source text had been updated (but for which the English, from which the translation was done, had not been updated yet), cells for which both the Japanese source text and the English source text had been updated, the former and the latter once more, but then for cells that were currently being translated to Dutch and therefore would need another update right after delivery of the Dutch translation, and once more all previously mentioned cells, but then with a pending status as for some kind of reason the Japanese development team still had questions about one word or another, plus other cells with even brighter colors, that were not updated but, according to the Japanese development team, contained errors, though there were other cells with once more different colors for situations in which the developer's branch office in English did not agree with the comments from Japan. Are you still following this?
So now the cells did not only have colored keywords, the background cells themselves also got all colors of the rainbow. And of course, the orange text on an orange background offered a whole new and very inspiring perspective on the translation business.
There's a reason why version control is so incredibly important. CAT tools can do it for you, as they remember exactly what has been translated by whom and when, what has not been translated yet, what is new and what is old, et cetera, et cetera, but to leverage that functionality, you actually need to be able to use these CAT tools, something which had become impossible due to the format and the way the project was organized.
Just when we thought we had seen it all, there was another complication: besides the console manufacturer (say Sony), that defined the platform terminology, and the developer (say Konami), that defined their brand's terminology, there was also a licensor (say Warner Brothers), that defined the license terminology. Suddenly and halfway through the project, they all started to come up with their own instructions about which translations to use for which terms, but it was not clear whose instructions should get priority when conflicts arose (and there were quite some overlaps). This probably explained at least half of the colors of the cells in Excel, and by now the chaos was complete.
The voice of reason
My client (the translation agency) realized that things could not go on like this, so they had a little chat with the end client (the developer). The translation agency admitted that splitting the project amongst multiple translators just for fun had not been their brightest idea in history, and the end client realized that it wasn't very fair to expect a translator to reserve time for batches that never showed up. I on the other hand realized that if the end client and my client were willing to compromise, I should be a bit more flexible too.
The developer also gave clear instructions about which hierarchy to follow: first the licensor, then the console manufacturer, then the developer. At least now we knew how certain terms actually had to be translated. One problem though was that the licensor, who had the last say on things - how shall I put this - well, his Dutch wasn't entirely up to par, so in the end we did get stuck with a few terms that shed an... interesting light on Dutch grammar.
Another difference was that now, the client constantly kept me in the loop about how many words were due to arrive when. He also made sure that all work related to this project would go to me and me alone. The project was actually becoming fun again. However, a lot of damage had been done and I strongly realized that no matter how hard we tried, we'd never be able to deliver a perfect product anymore, as basically we were not allowed to touch strings that contained no wrong terminology. That is, the end product would still be schizophrenic, due to the different writing styles of the different translators. But sometimes a man's gotta do what a man's gotta do, so I just got on with it.
The console manufacturer in question is known to attach a lot of value to quality, but their HR department is also a bit greedy. Basically they have all in-game translations checked by temps, who are not employed longer than 3 months (to avoid having to pay pension premium and the like). This means that the linguistic staff at this manufacturer changes every 3 months, is therefore relatively unexperienced and really wants to prove themselves to see if they can work at the same company for another 3 months.
You already know where this is heading to: these proofreaders will do anything it takes to find mistakes in in-game translations. And if there are no mistakes, they will simply create them by labeling them as such. That is, these proofreaders do not just check grammar and spelling, they simply rewrite the entire translation to justify their jobs. No one knows this but themselves, and the console manufacturer, that speaks no Dutch, thinks that they just got themselves the best proofreader ever. When you try to tell them, you are of course the jealous translator who simply dares not admit he made so many mistakes.
Ah well, believe it or not, 25% of my translations were rewritten. Of course there's a possibility they were really bad. But if you really think so, you probably wouldn't be reading this. Fortunately my client saw right through this, as the console manufacturer's proofreader was contradicting himself constantly: in cell 123 he would tell you to use 'okay' instead of 'alright', and in cell 2863 he would tell you to use 'alright' instead of 'okay'. There were dozens of examples like that.
Normally, if the console manufacturer wants you to translate term A as B, this can be implemented fairly easily using search & replace. But these instructions were rather about style (personal and impossible to define by definition) than terminology, and very whimsical too. So basically we decided that whatever changes the console manufacturer wanted to make in our translation, were his responsibility. These changes were not fed back to my CAT tool either, as these changes too were delivered in the format-from-hell. Even with my software, feeding them back to my CAT tool would take a lot of time, for which the developer refused to pay. I love my job, but I'm not a charity organization.
And even if the console manufacturer's suggestions would have been merely a matter of replacing term X with Y, this could not have been done with a simple search & replace. First, because the standard Excel S&R feature is unreliable when it comes to color-coded text (and as you know, the developer loved color-coding as many words as possible). Second, because many words were hyphen-ated or hy-phenated to make optimal use of the very limited space available. Whenever we wanted to automate something, the impossible format would start rearing its ugly head. The implications of it were far bigger than anyone could have foreseen.
Besides the normal problems you face when localizing games (for example the fact that the order of the strings is often totally random and based on the order in which they appear in the software, instead of the actual order in which they appear in the game), there were other things too. For example, during the project I found out that many of the length restrictions imposed on the Dutch translation were not based on the size of the actual dialogue in which the text would be displayed, but on the length of the original Japanese source text. Now Japanese can be a very compact language, as one Japanese character often stands for a whole English word. So basically the above means that if there's a dialogue with a width of 20 characters, of which only 2 are used for the Japanese source text, I'm only allowed to use 2 letters for the Dutch translation, even if there are 20 characters available. Obviously this does not improve the quality of the translation. I mean, try rendering the English translation for the Dutch word 'u' (you) in 1 letter please. You get 10 seconds.
Which brings me to my next point... you probably noticed that the original language of the game was Japanese, not English. Then why did I have to translate it from English, which was already a translation from the Japanese?
Unfortunately this happens all the time. Translating from Japanese is slightly more expensive, and since money is more important than quality, most developers decide to go for a translation of a translation.
As I'm writing this, the project is still not finished. The vast majority of the work has been done, but new batches keep coming in. As for now, however, I don't think much will change, as the project seems to have stabilized.
The game will probably be published near the end of this year. Whether the Dutch translation will be as good as it could have been, remains to be seen. I've given it my very best shot, but I could have done a lot more had the project been organized differently (or if the client's budget had been less tight, so that I wouldn't have been forced to divert my attention to other projects too). What strikes me most is that this is a very big title for some really big names in the industry - you'd expect that these people knew how to streamline their localization. Not.
So what can we learn? Quite a few things I guess:
1. Make sure your strings are complete and clean. Separate text from codes as much as possible.
2. Convince Japanese developers to autowrap texts in languages with spaces.
3. Have your game color/tag keywords on the fly instead of coloring/tagging them manually. And if you do the latter, use actual tags instead of invisible color codes in Excel that can only be extracted with Visual Basic.
4. Use as few translators as possible (1 translator is best).
5. Do not use Flemish translators for the Dutch market. Do not, I repeat, do not believe them if they claim that they can write Dutch that is acceptable for the Dutch market. That would be a first. The difference between Dutch and Flemish is much, much bigger than for example the difference between American and British English. Let the Flemish do Flemish and the Dutch do Dutch.
6. Define responsibilities, make sure instructions do not conflict and indicate who has the last say on things.
7. Define terminology before the translation starts, not while it is underway.
8. Have terminology defined by a linguist, not by someone who thinks he's a linguist, no matter how high up in the chain of command he is.
9. Don't expect translators to reserve time for batches that may never arrive. Be ready to compensate if you can't guarantee the hand-off date but still want the translator to guarantee the delivery date, or be less strict with your deadlines.
10. Realize that proofreaders at console manufacturers' are relatively inexperienced and sometimes merely try to justify their job.
11. If you absolutely have to work with length restrictions, at least define them honestly and don't base them on unfair and automated algorithms. Remember: the less space you reserve for the translation, the worse the translation will become. Know that most languages are not as compact as English and reserve at least 50% extra space for sentences and 400% extra space or even more for single words.
12. It is not cool to use dozens of tabs in your Excel file for "easy navigation", unless you have a sadistic nature.
13. Consider hiring a freelancer directly instead of using an agency as a middle-man. It will give you far more control over the decision process.
14. Try to have the translation done straight from the source language. Avoid translations of translations. Did you ever try to pass a message via 30 colleagues at work, by whispering? Do it now. You will realize that every translation deviates a bit more from the original source text.
15. Be realistic about deadlines and prices. Don't expect translators to work for free; don't expect them to be wizards who can translate 20,000 words a day either.
16. If your project is interesting and you grant me exclusivity, I am definitely willing to go the extra mile and sacrifice a few weekends.
17. You had better save money and time by implementing a more efficient workflow from the start, rather than compromising quality by trying to save money on translators and translations later. Well begun is half done! I am convinced that half of the client's budget evaporated due to the impossible format of the source text.
18. Microsoft Project is great software, but it really makes no sense to stick to very strict deadlines for certain batches if you know beforehand that changes in the text will occur anyway. Project milestones are a means, not a goal in themselves!
19. Listen to your translator. Sometimes he may actually offer good suggestions to streamline your localization process.
Good luck!Loek van Kooten
Your English/Japanese-Dutch game translator