What does it take for a system to support a locale?

Question

What does it take for a system to support a locale?

Asked 11 years, 4 months ago

Viewed 1,272 times

12

I’m developing a system that will require a locale specific for it to work properly. That is, it is not only a matter of user interface, the system as a whole can become unusable if the locale correct is not supported (these are text processing functions, which require for example uppercase/lowercase rules and collation follow a fixed standard). I would like to know what is needed to ensure that the environment where this system will be run supports this locale.

I’m sorry if it’s a basic question, but I don’t know if all [modern] systems come with support for all the locales or if this is something that needs to be configured at the level of operating system or application. For example, if on my PC (Windows XP, locale/encoding Portuguese_Brazil.1252) i try to assign the locale to "Turkish", it accepts normally (Turkish_Turkey.1254) - although I have never explicitly installed support for the Turkish language... However, when trying to play an example that uses that language, did not succeed:

>>> locale.setlocale(locale.LC_ALL, 'Turkish')
'Turkish_Turkey.1254'
>>> u'mail'.upper() == u'MAIL'
True

(according to the post, should give False; will that be correct?)

Editing: I did the same test with Java, and the result was as expected; I think my Python code is incorrect...

System.out.println("mail".toUpperCase().equals("MAIL")); // true
Locale.setLocale(new Locale("tr", "TR"));
System.out.println("mail".toUpperCase().equals("MAIL")); // false

I did the same test on a Windows 7 computer, and on the server where I host my site (Freebsd, using tr_TR.UTF-8) and I got the same result. Strangely, on a computer with Ubuntu 12.04 I could not assign this locale... which reinforces the hypothesis that an additional configuration may be required. (Editing: yes, as per @J. Bruni’s comment, you need to manually install the locale on Ubuntu)

Additional details:

Ideally, I would like users in different locales to be able to work with each other’s data, i.e. if I process a text in Portuguese and send it to a Turkish, it can reproduce my result, and vice versa. The important thing then is not that a fixed location be used everywhere, but rather the same locale used in a processing can be used by anyone who needs reproduce the same processing (i.e. same input, same output).

Anyway, the question is the same: what I as a developer need to do [during deploy] to ensure that the environment where the system will be installed provides appropriate support to the locale(s) (s) required)?
At first, I wanted to do this in the browser, via Javascript. The new class Intl.Collator It seems promising, since of course the desired locales are supported. In the absence of this, I am satisfied with solutions involving plugins and/or local installation (platform independent).

2

From my experience with Ubuntu, the locale to be used needs to be installed. Even with the command locale -a the locales already installed and available on my system are listed.

– J. Bruni

2014/02/12 at 01:14
1

@J.Bruni Thanks! I installed the Turkish locale following these instructions, now the result on Ubuntu is consistent with the others.

– mgibsonbr

2014/02/12 at 01:22
Without programming language and/or OS(s), isn’t it a little too broad? I think it depends a lot on the main framework and/or lib of each case. Qt has Qlocale, Windows Api has its own methods, Harbour has internal support for some locales, etc (both to implement the locale in the language, and to detect the current OS user locale, which may have more than one installed). Or you want your system to install on the client’s OS the native locale of that platform, for example?

– Bacco

2014/07/02 at 00:01
@Bacco I repeat: I do not know nothingness about locales. What you’re saying (i.e. language-dependent) is new to me...

– mgibsonbr

2014/07/02 at 00:04
1

@mgibsonbr has several possibilities that passed me when I read your question. One is you manage your application’s strings, another is the issue of the framework’s native messages or api( ok/Cancel/Retry), another is the question of date formats, monetary etc. And still separate from these, has the issue of your app detecting which user locale for each of these things, as configured in OS. (in other words, locale "fills the bag" when implementing, and it is a widespread problem that. There’s a lot more people lost in this than it looks, because the subject is full of details)..

– Bacco

2014/07/02 at 00:07
2

@Bacco All right, I’ll see if I "narrow" the question a little: I’m interested mainly in rules of collation: as exemplified in the "Additional Details", if I do an operation of uppercase in the pt_BR locale (mail -> MAIL) and send everything - same program, same input - to a Turkish, if he tries to reproduce my processing he will get a different result (MAİL). I wish, as far as possible, that he could run my code on the pt_BR locale - preferably without having to install anything in the OS.

– mgibsonbr

2014/07/02 at 00:24
The platform doesn’t matter much: the ideal - JS in the browser - seems to be unviable, so some local installation will be required. As this is practically the only requirement for local installation (everything else is feasible without it), so I’m open to virtually any possibility. Again, giving preference to solutions that do not require tampering with the OS.

– mgibsonbr

2014/07/02 at 00:27
Without touching the OS seems complicated, unless the OS is already prepared by default with locales. I have also messed with it a lot and it is a headache, maybe it would be interesting to open the question in SOEN and then pass an answer here, this if someone answers you in conditions ;) but in EN it’s certainly more comprehensive. Sorry for the long comment to say practically nothing lol.

– Jorge B.

2014/07/04 at 11:30
1

Have you tried this? $ dpkg-reconfigure locales

– rafaels88

2014/07/04 at 14:38
1

@rafaels88 Thank you for the suggestion, but in my case the use of locale-gen was enough. Anyway, the question has more general character - is not specific to Ubuntu (maybe that’s why it is too wide... I will try to edit it later a little to leave more "responsive").

– mgibsonbr

2014/07/04 at 17:24

Show 5 more comments

2 answers

7

I worked in the internationalization and localization of some systems in Java. I didn’t get to use languages very different from the Western ones, like the ones written from right to left, but I will report the points of attention that are probably important throughout the localization process.

Support for locales

Bear different locales will depend on Operating System support if the language in question makes use of native system Apis, which is probably the case with Python and other languages used primarily in the Linux/Unix environment.

On the Java platform, on the other hand, the virtual machine brings a certain independence to this. A documentation of locale java 5 informs that Java Runtime Environment (JRE’s) may contain only a few locales depending on the installed version. However, the Java Development Kit (JDK) comes with all international versions installed.

Java supports the ISO 639 standard, which standardizes languages with two-letter abbreviations, and the ISO 3166 standard, which standardizes countries and large regions with two or three letters.

Defining the locale in Java

There are two ways to define the locale in Java:

Globally, for the whole JVM with the Locale.setDefault.
Locally, using parameters for the treatment classes of dates, numbers, translation files and any method that depends on the location.

Other than for applications desktop, the recommended method is always the second.

In web applications, frameworks that support localization and internationalization store the locale the session scope or scope of the request. The locale can be selected by the user through configuration or by the developer by some specific criterion, for example by analyzing HTTP headers.

Internationalization support (I18N) in Java applications

Java localization is commonly done through files .properties, because there are already Apis on the platform that handle this format well through implementations such as ResourceBundle and virtually all frameworks support this pattern.

It is appropriate to adopt the codification UTF-8 in all files, fonts, web pages and text pages.

There is a standard for naming files .properties which allows transfer to new languages without changing the code.

Example:

resources.properties
resources_pt.properties
resources_es_ES.properties

In the above examples, if a locale as new Locale("pt", "BR"), Java will try to find the file whose name suffix is _pt_BR. Not finding it, it will load the file _pt.

If a locale as new Locale("en", "US"), Java will try to find the file whose name suffix is _en_US or _en. Not finding them, it will load the file resources.properties, which is the default.

It is also possible to use parameter interpolation with the class MessageFormat to make messages more flexible. For example, if you have a file pt_BR with the message:

mensagem=Página {0} de {1}

And another en_US:

mensagem=Page {0} of {1}

Then just upload the file according to the locale of the user and process it as an example:

ResourceBundle resources = ResourceBundle.getBundle("resources", localeUsuario);
String mensagem = resources.getString("mensagem");
mensagem = new MessageFormat(mensagem, localeUsuario).format(
    new Object[] { paginaInicial, paginaFinal });

The class MessageFormat supports some types of formatting, for example:

mensagem=At {1,time} on {1,date}, there was {2} on planet {0,number,integer}.

Location support (L10N) in Java applications

The locale should then be used in all snippets that process localized information. For example:

Numbers: NumberFormat.getNumberFormat(locale)
Dates: new SimpleDateFomat(pattern, locale)
Strings: Collator.getInstance, new MessageFormat(pattern, locale)

It may seem simple, but it is easy, very easy, let details pass in the back and forth of information from front to the back-end.

A very important detail is that the font-end should always be in sync with the back-end and this implies that the visual components such as date pickers or Masked inputs of numbers should receive their respective formatting.

At that point, frameworks Component based as JSF or Struts help with components that support localization automatically. On the other hand, frameworks action based that are "decoupled" from the front end can benefit from Javascript components such as jquery UI that support masks.

Localized word processing

The Java platform provides specific Apis for different tasks.

For comparison of text considering the location one can use the class Collator. There are very specific settings in the class, which are worth a read in the documentation.

The class Character provides an API to handle Unicode characters in a generic way.

To detect letters and words in different languages you can use the API BreakIterator.

Additional points

In addition to language, the database be prepared to store and retrieve data in supported languages. This means that if there are routines (procedures and functions) they should also consider localization. In such cases, it is easier to leave the processing of texts to the system so that the concern is centralized in one point.

The display of data from different locations should also be considered. For example, when a user in Brazil views a page with information generated by a user from China. Unfortunately there is not much to do when the operating system or browser does not support the characters of a language. However, a font supporting the entire system supported Unicode character range should be adopted on the website.

Another point is the question of search and indexing. I don’t work with search systems, but given the particularities that different languages have, it is visible that implementing a generic search that works in any locality is very complicated. It is not impossible, but the system will probably need specific implementations, such as considering common terms that are ignored, constructs and word composition, synonyms, etc.

Final considerations

I18N and L10N are complex topics and, unfortunately, it is very difficult to establish general rules for any location. And this only in the scope of the Java platform.

There is a topic of the Javase Tutorial to address in particular the subject which addresses in more depth the topics summarised in the reply.

Overall, when processing the data properly with some location-supported API, it is possible to receive and display information to and from any locale, according to the rules of each.

A practical article (in English) that talks about this is: Java Internationalization and Localization.

1

"Supporting different locales will depend on Operating System support if the language in question makes use of the system’s native Apis..." "On the Java platform, on the other hand, the virtual machine brings a certain independence to this." It’s just that I needed to know, thank you! Despite all my background - and repeatedly stress that I am 100% n00b on the subject - people seem to be reading too much between the lines of the question... This information - in particular the references presented - will be very useful to help me define my strategy.

– mgibsonbr

2014/07/06 at 21:17
2

@mgibsonbr I confess that I also read a lot between the lines. The problem is that we know your reputation. When you say that you "don’t know much" about something, we imagine that instead of being in the 99th percentile of what you know about it, it’s only in the 98th. : D

– utluiz

2014/07/06 at 22:12

Browser other questions tagged localization cross-platform

You are not signed in. Login or sign up in order to post.

by Peter Krauss • **1,830** points · Answer 1 · 2014-07-06T19:15:59+00:00

The question is vague, in the sense of covering from installation problems in UBUNTU to the question of whether or not to deploy a system in Web... And have been marked only with the tags #location and #Cros-Plataform. The answer doesn’t have to be vague, as @utliz has already shown, but I give myself the freedom to focus on what I think are "context priorities". Others can edit, is in Wiki mode.

.. Reminders ...

UTF-8

Is a pattern "in fact" and "de jure", as already set out in this answer. All languages (including Javascript) and all decent forms of data exchange (e.g. JSON and XML) accept UTF8, and therefore can handle multiple locations.

All in the system need to rotate in UTF8: by making sure of this, you will have taken the first and most important step.

If it is a database or something involving text sorting functions, in general when configuring charset (UTF8) also configures to collation, who gives the order of charset. This is system, framework or library configuration, and you need to find out if it can be dynamic configuration (some databases do not allow changing, so it complicates when mixing languages in the same database). Capturing server (or even client) configuration is not always the best solution.

HTML

This is about lingua franca, not only the contents of the Internet, but also the exchange of content or fragments of content. Tools Javascript like the Ckeditor can be fully localized, including for Turkish, allowing full editing (creation or modification of HTML fragments) of the desired language.

Javascript

Despite all the traumas we have with cross-browser and even with locale Javascript, is still the language with the greatest potential of not making confusion with languages (on the contrary we have even tongue detectors like this)... Many people bet Javascript precisely because they can run the same code (eg. CPF validator or language detector) on client (browser), server (eg. Node.js) and in the database (e.g. Postgresql runs PL with Javascript).

You may have problems with native date-solving functions, but there are a variety of Javascript libraries, and everything else is multilingual. In addition, jQuery is one of the most commonly used tools for reformatting localized dynamic interface. If the case is to interpret text from different languages, one more reason, easy access to the DOM is fundamental.

Multilingual template

There are very simple and general principles to organize the languages of the interface and even the functionalities (eg. language refinements and filters applied to edited HTML)... In this tutorial we have just one example of how the simplest multilingual templates can be implemented.

In this other link we have a classic example of multilingual template, a kind of Vatican "locale", used since 1600.

I18N and L10N

It is the jargon of the "old guard" for the topics discussed in the question, "Multilingual and internationalization services": Robust systems (e.g. Java) and older systems (ex Cobol, C++, etc.) use this jargon. Newer systems can get the more general notion of "locale"... In the end it’s just a matter of "meeting" in the manuals.

In the UNIX/Linux universe it was the default POSIX that has established the conventions that we use to date.

L10N: dates, currencies, etc. For jQuery there are initiatives like the jquery-localize and the globalize... Each language, each framework will have its own.

Detection

Deducing a user’s "locale" can be an art... Knowing exactly is even simple, but requires authentication schemes (e.g. login and then Session) and a reliable registration. The most common deductions are based on the user hierarchy and the geographic position of the client:

hierarchy: registered as a "member of the department", for example, the user inherits the locale attributes of the department.
user agent geolocation: mobile phones and servers (e.g. LAN where the client) today respect geolocation standards, which, for the purpose of locale, do not need to be accurate data (in general the country code is sufficient).

In a form the most important field to establish the locale is the país, but in countries like Canada, the Netherlands, India, etc. the user can have there their preferences, then the field idioma ends up being equally important.