5 common internationalisation pitfalls (and how to avoid them) - Heart Internet Blog - Focusing on all aspects of the web

Internationalisation, sometimes shortened to “i18n”, is an important but often overlooked step of developing an application. Internationalisation is the process of designing your application, so that it isn’t hard-coded to one language, locale, or region. A poorly internationalised application makes for a confusing and frustrating experience for users who don’t speak English or aren’t from the same country as the application’s developers and designers.

Internationalisation is often confused with localisation, but they are different concepts. The process of localising includes translating and adding the appropriate resources to support another locale. Ideally, your application goes through the i18n process once, and then it’s ready to be localised as many times as you have locales you wish to support. Therefore, it’s entirely possible to have an internationalised application that is only in English. This would mean that an American and an Australian user would both be able to navigate the application without confusion even though they are in different parts of the world, and that if the application were to be localised in the future, the codebase would support this with minimal friction.

Internationalisation encompasses so much more than adding lang=”en” to your HTML. Many of the issues that arise during this process are things that you might not know you had to look out for.

Let’s explore five common pitfalls when internationalising your app and how to solve them.

Problem 1: Translations

Using Google translate

Let’s start with the most obvious aspect of internationalisation and localisation: translating text from one language to another. A machine won’t understand context and often provides confusing, incorrect, or inappropriate translations.

Hard-coding text

When we build to-do apps from scratch, we usually hard-code any text that goes in labels, paragraphs, buttons, etc. While this is fine for personal projects, this will cause a lot of trouble if we ever want to localise to another language.

Assuming English grammar rules apply to all languages

Even if you already have your text translated, pluralisation rules can be really tricky in other languages. In English, we have two forms of plural nouns; having zero or two or more of something usually adds an “s” at the end of the word, while having one of something does not. The rule is similar in other languages, such as Spanish, but it’s not universal. Many languages, such as Japanese and Korean, use the same word for zero, one, and two or more of an item. Some languages, such as Czech, have more than two forms of plural nouns.

So hard-coding something like this won’t cut it:

 

    if (count === 1) {
        return ‘singular’;
    } else {
        return ‘plural’;
    }

 

Furthermore, several languages have gendered nouns. For example, in Spanish, a feminine noun usually ends with “-a”, whereas a masculine noun ends with “-o”.

Solution:

  • Don’t rely on Google Translate or other free translation services. While there are certain common words and phrases that can be easily found (look for the verified badge on Google Translate), anything business specific or longer than a few words should be properly translated by a professional.
  • Once you have your translations, use an i18n framework such as i18next to help you easily deal with interpolation and varying pluralisation rules.
  • Figure out a system of storing text strings. For small or English-only applications, this may be as simple as a JSON file bundled with the application. For more robust applications with multiple supported locales, you may want to store these translations in a database, retrieve them via an API, and cache them.
  • Don’t concatenate translation strings. Even if it works in English to concatenate strings together, grammatical structure varies greatly among languages.

Problem 2: Dates, times, and numbers

Dates

Dates are a shining example of why you need to internationalise your application even if you never plan on adding support for languages other than English. Though several countries speak English as their official language, the way those countries format dates varies widely.

Take the following date, for example: 01/02/2020.

If you were to ask an American what that date was, they would most likely say it was January second. If you were to ask a British person the same question, they would most likely say it was the first of February. Date formatting varies greatly by country.

Times

Deciding how to display a time can also depend on the region. For example, in the US, a user most likely prefers a 12-hour clock (3pm), whereas someone in Germany most likely prefers a 24-hour clock (15:00).

Numbers

Number formatting is yet another consideration when internationalising. Let’s take the integer 123456.78 as an example. In the US, you would format this number as 123,456.78; in India, 1,23,456.78; and finally in Spain, 123.456,78. And these are only three different variances; here’s a list of ways to format numbers depending on the country.

Time zones

Anyone who has ever worked with time zone conversion can attest to the difficulties that arise when converting one time to another using time zones. Not only can the math be tricky converting a timestamp from one time zone to another, but some countries observe daylight saving time while others do not. In the countries that do, not all parts of the country observe it (e.g. Arizona versus the rest of the US). The countries that do observe DST do not all change their clocks on the same day. And sometimes, a country will decide not to observe DST a week before it happens.

Calendar

Calendars also vary depending on the region. Take an American calendar versus a British calendar as an example:

British calendars start on Mondays
American calendars start on Sundays

While the American calendar starts on a Sunday, the British calendar starts on a Monday. Additionally, if the calendar you use shortens the days to their first letter (M, T, W, etc), this may be confusing with an Arabic calendar, where every day starts with the same letter.

Solution:

  • Dealing with all the localised data for preferred date and time formatting is simply too much for one team to tackle. There is no need to reinvent the wheel when several open source libraries exist to fix these issues. The most notorious of JavaScript date/time libraries is probably Moment.js. For those who want an alternative to Moment, here’s a list that compares alternative libraries.
  • For less complex i18n needs, use Intl, a native JavaScript API:
    new Intl.RelativeTimeFormat('es-MX').format(-1, 'month');
        => "hace 1 mes"
    new Intl.NumberFormat('ja', { style: 'currency', currency: ‘JPY’ }).format(172630);
        => "¥172,630"
  • Store your dates in UTC (Coordinated Universal Time). UTC isn’t a timezone but rather a standard that is used commonly around the world.
  • Store your numbers as integers; the client can handle number formatting.

Problem 3: Non-Latin characters

Character encoding

If you’ve ever seen garbled text like “Ã���ƒÆ‚�☐☐�Æ’ƒâ��ö☐��€šÃ��‚ ©”, this often signals that the text contains non-Latin characters but hasn’t been properly encoded.

Basing your regex/validations on English patterns

This is yet another reminder of the fact that just because you aren’t localising in other languages than English, you still need to properly internationalise your application. It’s not a good user experience to tell someone that their name is invalid.

Most Chinese last names are only one character long. Keep in mind that one character can have meaning in some languages.

Solution:

  • Use unicode.
  • UTF-8 is the default encoding system for most websites.
  • If you’re using a custom font, ensure that it looks good in all languages you support. Just because a font looks great in languages that use a Latin alphabet, it doesn’t mean it will work with a Cyrillic alphabet or Japanese characters.
  • Be conscious of your sorting algorithm if you include non-Latin characters in your results.
  • Be careful with routing. If you implement dynamic routing and support Korean, you might end up with Korean characters in your URL. While this is possible, you may want to stick with just ASCII characters in your URL to keep things simple.
  • Finally, be aware of what you are checking for when validating fields (it’s possible to use unicode in regex).

Problem 4: Designing with only English speakers in mind

The text fits in English, but not in other languages

If the design team uses English at work, it makes sense that mockups will contain English text. What often happens, however, is that the design only accounts for the space of English words and forgets that other languages often take up more space.

Certain languages, such as Chinese, Korean, or Japanese, often take up less space than their English counterparts. Other languages such as Portuguese, French, and German can sometimes take up significantly more space than English words.

Check out the difference in lengths in these English and German words:

Santa = Weihnachtsmann

Car insurance = Kraftfahrzeugversicherung

Speed limit = Geschwindigkeitsbegrenzung

Matchbox = Streichholzschachtel

Solution:

  • From a UX perspective, allow for enough flexibility and space and ensure that translated text fits into your design.
  • From a programming perspective, make sure to keep your CSS flexible and avoid fixed widths.

Problem 5: Supporting right-to-left languages

Adding dir=”rtl” messes up the layout

English is read left-to-right, as are many European languages. However, several languages are read from right-to-left, such as Arabic, Aramaic, Azeri, Dhivehi/Maldivian, Hebrew, Kurdish (Sorani), Persian/Farsi, and Urdu. When supporting a RTL (right-to-left) language, not only does the text change directionality, but the layout is completely flipped across the Y-axis. Look at how Wikipedia changes directionality:

Wikipedia page in English (a left-to-right language)
Wikipedia page in Arabic (a right-to-left language)

By adding dir=”rtl” to your HTML, the browser will change the directionality of the markup for you. Underneath this is a complex bidirectional algorithm that determines which direction text and punctuation should go.

Let’s look at some basic HTML with the directionality set to right-to-left.

 

See the Pen
basic TRL
by Robin Dykema (@rockinrobin714)
on CodePen.

Notice how our header and div now align to the right, even though we haven’t added any CSS? This works great as expected.

What happens when we add CSS properties that use the words “left” or“right”? Let’s look at an example where in English (a left-to-right language), we want the header on the left, but we want to float a div to the right.

Left-to-right floats:

See the Pen
LTR float
by Robin Dykema (@rockinrobin714)
on CodePen.

Right-to-left floats:

See the Pen
RTL float
by Robin Dykema (@rockinrobin714)
on CodePen.

Notice that while the header changed directionality, the div stayed on the right side. This is not the desired result.

Let’s change the CSS to use Flexbox instead.

See the Pen
RTL flexbox
by Robin Dykema (@rockinrobin714)
on CodePen.

Solution:

  • Use Flexbox over floats. Flexbox observes the directionality by using words like “flex-end” and “flex-start” which makes sense in both LTR and RTL languages.
  • When naming classes and components, name them based on what they do, not what they look like. For example, instead of calling a set of pagination arrows “left” and “right”, name them “previous” and “next”. This tells us what the arrows do and makes sense no matter the directionality of the language. Other words that work include “before”, “start”, “beginning”, and “backwards”.
  • Note: some languages, such as Chinese and Japanese, are sometimes read from top to bottom in traditional texts. However, on the web, mostly you will find these languages printed left to right.

Conclusion

You might be feeling overwhelmed by everything that comes with internationalising your application. If you are looking for first steps, try using the W3C i18n checker. Ultimately, internationalising properly will mean bigger markets for your application, more inclusiveness, and better code that is extendable and easier to manage. Try to design your app to be as language-, region- and culture-independent as possible to alleviate problems before they arise. And when there is a problem, try using open source projects to help solve those problems (and the problems you didn’t even know existed).

Subscribe to our monthly Heart Internet newsletter, filled with the latest articles about web design, development, building your business, and exclusive offers.

Subscribe now!

Comments

Please remember that all comments are moderated and any links you paste in your comment will remain as plain text. If your comment looks like spam it will be deleted. We're looking forward to answering your questions and hearing your comments and opinions!

Leave a reply

Comments are closed.

Drop us a line 0330 660 0255 or email sales@heartinternet.uk