I worked in the internationalization and localization of some systems in Java. I didn’t get to use languages very different from the Western ones, like the ones written from right to left, but I will report the points of attention that are probably important throughout the localization process.
Support for locales
Bear different locales will depend on Operating System support if the language in question makes use of native system Apis, which is probably the case with Python and other languages used primarily in the Linux/Unix environment.
On the Java platform, on the other hand, the virtual machine brings a certain independence to this. A documentation of locale java 5 informs that Java Runtime Environment (JRE’s) may contain only a few locales depending on the installed version. However, the Java Development Kit (JDK) comes with all international versions installed.
Java supports the ISO 639 standard, which standardizes languages with two-letter abbreviations, and the ISO 3166 standard, which standardizes countries and large regions with two or three letters.
Defining the locale in Java
There are two ways to define the locale in Java:
- Globally, for the whole JVM with the
Locale.setDefault
.
- Locally, using parameters for the treatment classes of dates, numbers, translation files and any method that depends on the location.
Other than for applications desktop, the recommended method is always the second.
In web applications, frameworks that support localization and internationalization store the locale the session scope or scope of the request. The locale can be selected by the user through configuration or by the developer by some specific criterion, for example by analyzing HTTP headers.
Internationalization support (I18N) in Java applications
Java localization is commonly done through files .properties
, because there are already Apis on the platform that handle this format well through implementations such as ResourceBundle
and virtually all frameworks support this pattern.
It is appropriate to adopt the codification UTF-8
in all files, fonts, web pages and text pages.
There is a standard for naming files .properties
which allows transfer to new languages without changing the code.
Example:
resources.properties
resources_pt.properties
resources_es_ES.properties
In the above examples, if a locale as new Locale("pt", "BR")
, Java will try to find the file whose name suffix is _pt_BR
. Not finding it, it will load the file _pt
.
If a locale as new Locale("en", "US")
, Java will try to find the file whose name suffix is _en_US
or _en
. Not finding them, it will load the file resources.properties
, which is the default.
It is also possible to use parameter interpolation with the class MessageFormat
to make messages more flexible. For example, if you have a file pt_BR
with the message:
mensagem=Página {0} de {1}
And another en_US
:
mensagem=Page {0} of {1}
Then just upload the file according to the locale of the user and process it as an example:
ResourceBundle resources = ResourceBundle.getBundle("resources", localeUsuario);
String mensagem = resources.getString("mensagem");
mensagem = new MessageFormat(mensagem, localeUsuario).format(
new Object[] { paginaInicial, paginaFinal });
The class MessageFormat
supports some types of formatting, for example:
mensagem=At {1,time} on {1,date}, there was {2} on planet {0,number,integer}.
Location support (L10N) in Java applications
The locale should then be used in all snippets that process localized information. For example:
It may seem simple, but it is easy, very easy, let details pass in the back and forth of information from front to the back-end.
A very important detail is that the font-end should always be in sync with the back-end and this implies that the visual components such as date pickers or Masked inputs of numbers should receive their respective formatting.
At that point, frameworks Component based as JSF or Struts help with components that support localization automatically. On the other hand, frameworks action based that are "decoupled" from the front end can benefit from Javascript components such as jquery UI that support masks.
Localized word processing
The Java platform provides specific Apis for different tasks.
For comparison of text considering the location one can use the class Collator
. There are very specific settings in the class, which are worth a read in the documentation.
The class Character
provides an API to handle Unicode characters in a generic way.
To detect letters and words in different languages you can use the API BreakIterator
.
Additional points
In addition to language, the database be prepared to store and retrieve data in supported languages. This means that if there are routines (procedures and functions) they should also consider localization. In such cases, it is easier to leave the processing of texts to the system so that the concern is centralized in one point.
The display of data from different locations should also be considered. For example, when a user in Brazil views a page with information generated by a user from China. Unfortunately there is not much to do when the operating system or browser does not support the characters of a language. However, a font supporting the entire system supported Unicode character range should be adopted on the website.
Another point is the question of search and indexing. I don’t work with search systems, but given the particularities that different languages have, it is visible that implementing a generic search that works in any locality is very complicated. It is not impossible, but the system will probably need specific implementations, such as considering common terms that are ignored, constructs and word composition, synonyms, etc.
Final considerations
I18N and L10N are complex topics and, unfortunately, it is very difficult to establish general rules for any location. And this only in the scope of the Java platform.
There is a topic of the Javase Tutorial to address in particular the subject which addresses in more depth the topics summarised in the reply.
Overall, when processing the data properly with some location-supported API, it is possible to receive and display information to and from any locale, according to the rules of each.
A practical article (in English) that talks about this is: Java Internationalization and Localization.
From my experience with Ubuntu, the locale to be used needs to be installed. Even with the command
locale -a
the locales already installed and available on my system are listed.– J. Bruni
@J.Bruni Thanks! I installed the Turkish locale following these instructions, now the result on Ubuntu is consistent with the others.
– mgibsonbr
Without programming language and/or OS(s), isn’t it a little too broad? I think it depends a lot on the main framework and/or lib of each case. Qt has Qlocale, Windows Api has its own methods, Harbour has internal support for some locales, etc (both to implement the locale in the language, and to detect the current OS user locale, which may have more than one installed). Or you want your system to install on the client’s OS the native locale of that platform, for example?
– Bacco
@Bacco I repeat: I do not know nothingness about locales. What you’re saying (i.e. language-dependent) is new to me...
– mgibsonbr
@mgibsonbr has several possibilities that passed me when I read your question. One is you manage your application’s strings, another is the issue of the framework’s native messages or api( ok/Cancel/Retry), another is the question of date formats, monetary etc. And still separate from these, has the issue of your app detecting which user locale for each of these things, as configured in OS. (in other words, locale "fills the bag" when implementing, and it is a widespread problem that. There’s a lot more people lost in this than it looks, because the subject is full of details)..
– Bacco
@Bacco All right, I’ll see if I "narrow" the question a little: I’m interested mainly in rules of collation: as exemplified in the "Additional Details", if I do an operation of uppercase in the pt_BR locale (
mail
->MAIL
) and send everything - same program, same input - to a Turkish, if he tries to reproduce my processing he will get a different result (MAİL
). I wish, as far as possible, that he could run my code on the pt_BR locale - preferably without having to install anything in the OS.– mgibsonbr
The platform doesn’t matter much: the ideal - JS in the browser - seems to be unviable, so some local installation will be required. As this is practically the only requirement for local installation (everything else is feasible without it), so I’m open to virtually any possibility. Again, giving preference to solutions that do not require tampering with the OS.
– mgibsonbr
Without touching the OS seems complicated, unless the OS is already prepared by default with locales. I have also messed with it a lot and it is a headache, maybe it would be interesting to open the question in SOEN and then pass an answer here, this if someone answers you in conditions ;) but in EN it’s certainly more comprehensive. Sorry for the long comment to say practically nothing lol.
– Jorge B.
Have you tried this? $ dpkg-reconfigure locales
– rafaels88
@rafaels88 Thank you for the suggestion, but in my case the use of
locale-gen
was enough. Anyway, the question has more general character - is not specific to Ubuntu (maybe that’s why it is too wide... I will try to edit it later a little to leave more "responsive").– mgibsonbr