multilanguage software how to..
category: general [glöplog]
ditto.
do you mean mixing multiple programming languages or localization?
adding support for multiple languages to software.
like german, spanish, etc.
add all strings to a table, only refer to them by ID, change the table to do a translation. there's tools that automate the annoying parts of this (assigning IDs to strings and making sure the tables stay in sync), but that's basically it.
if you're looking for something open+free, try gnu gettext.
oh, and be careful with printf and the likes. different languages call for words in different places, so the order in which parameters are referenced can change with the target language (depending on the grammar). one common solution is "positional printf" where the ordering is made explicit instead of implicit. Jare coded up a nice implementation of this years ago: the code is here.
if you're looking for something open+free, try gnu gettext.
oh, and be careful with printf and the likes. different languages call for words in different places, so the order in which parameters are referenced can change with the target language (depending on the grammar). one common solution is "positional printf" where the ordering is made explicit instead of implicit. Jare coded up a nice implementation of this years ago: the code is here.
And while you're at it, don't forget that some languages have special punctuation marks or rules (french has a space before colon, Spanish has the funny question marks ¿Que?, the hyphenation rules in Japanese are pretty strict and so on), things like date formats change between countries and that some languages (like Greek) take significantly more room than others. Your GUI needs to be pretty flexible to accommodate this and you can't pretty much hardcode anything.
Because of all this, when I was doing multilanguage mobile software (games for java phones, so small screens), the requirement was that all text on the screen needs to be able to scroll. It caused a lot of headaches sometimes, and then there were all those things like buttons that couldn't include text (since "Select" could be like 20 characters in some language) and symbols and colors that might be culturally dependent..
Incidentally, I am now working on something that needs to be localized to many European languages as well as Asian languages that need custom fonts, all with specs that are pretty tight and an essential third-party library added to a mix. I've never had so much fun in my life.
omg, greek is even longer than finnish? good tips here, preacher! thanks :-)
slightly off topic, anyone got experience with doing this on a web application? i'm really not sure whether to go to some gettext-ish solution or to just make a translation db table with a simple frontend. i'm also wondering whether the 'look up all translated texts' additional db query can somehow be prevented for every pageview with smart caching tricks or so. well it can, of course, i'm just wondering how people do it :-)
slightly off topic, anyone got experience with doing this on a web application? i'm really not sure whether to go to some gettext-ish solution or to just make a translation db table with a simple frontend. i'm also wondering whether the 'look up all translated texts' additional db query can somehow be prevented for every pageview with smart caching tricks or so. well it can, of course, i'm just wondering how people do it :-)
Why bother with trying to keep things in sync? Do a straightforward container class that reads string definitions out of a file and stores them in a hashmap. File could have a format like this:
It could be ASCII or some Unicode format or whatever. Then send the file off to different people to have it translated. All the container class needs is a few lines of code to build the map (in C++ with STL or Java or some such a two minute job), and a method for retrieving a string with a given ID. If you want you can go all fancy and add parameters to strings like:
You do need to worry about variable string lengths, but most good UI frameworks were made with this consideration in mind, for that exact reason. Just remember when you do interface layouts that they should be very flexible.
Date and number formats change from country to country, too, so a localised string builder class will be useful.
And of course on top of that there's languages like Arabic which are written right-to-left, so if you want to support those the localisation is a bit more extensive.
Code:
# Comment line explaining contexts etc.
welcomeMessage = "Hello!"
exitConfirm = "Do you really want to quit?"
ok = "OK"
cancel = "Cancel"
...
It could be ASCII or some Unicode format or whatever. Then send the file off to different people to have it translated. All the container class needs is a few lines of code to build the map (in C++ with STL or Java or some such a two minute job), and a method for retrieving a string with a given ID. If you want you can go all fancy and add parameters to strings like:
Code:
completionMessage( completion ) = "Your stuff is <completion>% done.
You do need to worry about variable string lengths, but most good UI frameworks were made with this consideration in mind, for that exact reason. Just remember when you do interface layouts that they should be very flexible.
Date and number formats change from country to country, too, so a localised string builder class will be useful.
And of course on top of that there's languages like Arabic which are written right-to-left, so if you want to support those the localisation is a bit more extensive.
Code:
Why bother with trying to keep things in sync? Do a straightforward container class that reads string definitions out of a file and stores them in a hashmap.
Depending on the specs of your target platform, you really don't want to have all strings in memory twice, once in your original language and once for the active translation. It's a non-issue on PCs, but on memory-limited platforms like mobile phones and not-so-nextgen consoles :), there's usually other places where you want to spend those kilobytes.
sorry to reply with so much leetness arround me but, i may point out that dotnet has a pretty neat localization mechanism. it is all done, all ready for use, all efficient...
Quote:
Why bother with trying to keep things in sync? Do a straightforward container class that reads string definitions out of a file and stores them in a hashmap.
Why bother with trying to code things that already exist? that's what dotnet does...
Quote:
slightly off topic, anyone got experience with doing this on a web application?
Yep. Not sure how relevant this is for other platforms, but I've done it on a Rails app using the Globalize plugin. Globalize is The Shit, and once you've tried it you realise exactly how much work it would take to do it (properly) yourself - it handles the locale-specific date/currency formatting stuff, weird pluralisation rules, and can even do some automatic SQL JOIN munging to fetch translations in the same query as your regular db lookups... although our SQL was a bit too complex for it to cope with, so we had to use the less-magic approach of adding new columns to the original tables to hold the translations. Oh, and it does indeed cache the view-level translations in an in-memory hash table.
(The site in question: English version / Welsh version. Doing the Welsh translation first was handy, as it meant that we could sort out the language issues separately from cultural/geographical ones. More recently we've been building a version for another country, and we've learned that the non-obvious country-specific things can be just as big a deal as the language - ranging from "we can't look up addresses by postcode any more" to "can we still get away with using a London bus as an icon?" :-) )
hooray for language-independant textures :)
b) make everything in chinese. 33% of the world speaks it.
Quote:
Depending on the specs of your target platform, you really don't want to have all strings in memory twice
You wouldn't need to, though. You'd only store the IDs to index the strings by, and the currently loaded translation. And the memory used up by the hash table is optional, you could do binary search lookups or whatever. If you're really optimising for very tight memory constraints you'd want to enumerate the strings and get rid of the IDs, sure, but if you're not then you might as well go for the flexible, easy-to-use and open-ended method.
Quote:
that's what dotnet does...
There's normally no point in reinventing the wheel, but .net is a particularly unattractive wheel. It's bloated and requires users to (download and) install an enormous framework to run even simple apps, which I personally hate (as a user). If all you want is a mechanism for storing different translations of a set of strings, .net is way overkill. Besides it's only relevant on certain platforms.
Visual Studio and the Windows API do provide a bunch of resources for localisation, though, you probably should use them in a Windows-only app.
Quote:
And while you're at it, don't forget that some languages have special punctuation marks or rules (french has a space before colon,
Not entirelly correct. In french you would have a space before a question or exclamation mark, but you would not have a space before a colon, semi colon, two dot or a final dot.
That WWF site is fucked up, BTW, even the non-Welsh version.
Quote:
In french you would have a space before a question or exclamation mark, but you would not have a space before a colon, semi colon, two dot or a final dot.
You do have a space before a colon or semi colon.
Quote:
the hyphenation rules in Japanese are pretty strict
Uh... Do you know something I don't know? Although I've only recently progressed to reading Junior High School level novels, I've yet to see a single instance of hyphenation. Japanese frequently breaks up words willy-nilly -- no need for hyphenation at all.
Quote:
Rouquemoute- You do have a space before a colon or semi colon.
http://grammaire.reverso.net/5_1_10_les_espaces_et_la_ponctuation.shtml
Apparently, there is a space before the dual symbols (;:!?) but no space before the single ones (.,).
In the thick client AJAX app I work on, all HTML markup is generated on the client side with JS+DOM.
Localization data is stored in JSON tables.
Text.SomeContext.en={
"id1":"English Text 1",
"id2":"English Text 2"
};
Code looks like this: Html.P(GetText("id1")+" "+GetText("id1"));
There's obviously a choice between JSON objects (bit like what Doom says) and JSON arrays (bit like what Ryg says). The latter is obviously harder to maintain, but has some size/speed advantages. Now we use JSON objects.
Gasman's point is true, this takes some time to implement (e.g. language-specific date formats, plurals, definite articles etc). But it's not sure that you can avoid all that by using a third-party string localization library if you also need to localize web controls.
You could also read from localization tables on the server side (that would be the typical choice btw). That puts some cpu load on the server (aspx/php execution). I wouldn't store localization data in the db, since it's pretty static but could generate an enormous db load. Or: store it in the db, but generate a json/xml/php cache file from that (either regularly or at request), and use/include the cache file for the page markup.
Oh, one more tip (if this wasn't trivial): store EVERYTHING in UTF-8. Otherwise, you'll lose your hair soon.
Localization data is stored in JSON tables.
Text.SomeContext.en={
"id1":"English Text 1",
"id2":"English Text 2"
};
Code looks like this: Html.P(GetText("id1")+" "+GetText("id1"));
There's obviously a choice between JSON objects (bit like what Doom says) and JSON arrays (bit like what Ryg says). The latter is obviously harder to maintain, but has some size/speed advantages. Now we use JSON objects.
Gasman's point is true, this takes some time to implement (e.g. language-specific date formats, plurals, definite articles etc). But it's not sure that you can avoid all that by using a third-party string localization library if you also need to localize web controls.
You could also read from localization tables on the server side (that would be the typical choice btw). That puts some cpu load on the server (aspx/php execution). I wouldn't store localization data in the db, since it's pretty static but could generate an enormous db load. Or: store it in the db, but generate a json/xml/php cache file from that (either regularly or at request), and use/include the cache file for the page markup.
Oh, one more tip (if this wasn't trivial): store EVERYTHING in UTF-8. Otherwise, you'll lose your hair soon.
Ger, hmm, doing the translation clientside is a pretty cool idea. it does mean that you'd have to completely redo the thing server side if you want to support browsers with javascript turned off, no? was that an issue for you?
Skrebbel: Yes, this site requires js turned on, but that was not a problem here.
You could also run the same code on the server (this site does not do that, but it's possible). This is a bit off topic, but if you have IIS, you can run .js code (linked into an .aspx or compiled into a .dll) on the server. Maybe you can do the same with apache mod_js, i just have no experience with that. But it's tested to work on IIS/ASP.NET. I wrote lots of js code that runs both on the client and the server side. I know that this is not a typical setup but it works well ;)
You could also run the same code on the server (this site does not do that, but it's possible). This is a bit off topic, but if you have IIS, you can run .js code (linked into an .aspx or compiled into a .dll) on the server. Maybe you can do the same with apache mod_js, i just have no experience with that. But it's tested to work on IIS/ASP.NET. I wrote lots of js code that runs both on the client and the server side. I know that this is not a typical setup but it works well ;)