Localisation in Java
We’ve recently had a problem where we wanted to produce a website in multiple languages including Russian, Czech, Romanian, and other eastern European languages. No problems, we thought, we can just use Java properties files and the fmt:message
JSTL tags.
Unfortunately, it’s not quite that simple because properties files cannot be UTF-8, so getting the source languages into properties files proved to be nigh on impossible. There is a program native2ascii
provided in the Java JDK, but this only copes with files in the current native locale, it can’t cope with UTF-8 files either.
So, in the end we wrote a little utility to do it for us. It’s actually very straight forward, just a single function does the trick:
String encodedString(String line) {
StringBuffer out = new StringBuffer();
line = toUTF8(line);
char[] chars = line.toCharArray();
for (int i = 0; i < chars.length; i++) {
char aChar = chars[i];
if (aChar > 127) {
out.append(String.format("\u%04x", new Object[]{new Long((long) aChar)}));
} else {
out.append(aChar);
}
}
return out.toString();
}
All we have to do is go through each character and if it is larger than 127, then we just write it out
in the u<em>xxx</em>
format.
You can take this code and call it from a loop that reads in a text file and then outputs it again (though you might have to tell Java what the encoding of the file is).
We decided to put it in a web page. Here we found another wrinkle, the string that you get when you call request.getParameter(...)
is encoded as ISO-8859-1. So you have to take the request parameter and convert it to UTF-8 as follows:
String source = new String(request.getParameter("source").getBytes("ISO-8859-1"), "UTF-8");
However, to save you the hassle, we’ve decided to put our web-based converter up on our website as a free service. So if you need to convert some strings, go to our converter!.