Internationalisation guidelines

This document is targeted at developers (core code, plugin code and user interface developers). If you are looking for instructions on configuring your Foswiki to work with your local language, see InternationalizationSupplement

In the following, I18N is shorthand for "internationalisation".

The Good News

It's easy to do
  • It really is easy to internationalize your plugin or core code, so it works with almost any language, not just English!
    • Typically, only a few lines need to change, in very simple ways
  • All the hard work has been done for you already.

It will help your plugin, and Foswiki
  • Using I18N lets your plugin (and Foswiki) be much more widely used, meaning more feedback, patches and general goodness
  • The I18N regexes make your code more flexible- e.g. there's now a single place to change definition of a WikiWord across all plugins.
  • They also work across systems from Windows to Linux and even mainframes, and across Perl versions and browsers

What you need to do

You do need to read this page and be a little careful when using regular expressions - any time you see [A-Z], \w or \b, a little alarm bell should go off in your head saying "I should use one of the I18N-aware regular expressions instead".

Goal of this page

Foswiki has reasonable internationalization (I18N) support today (see UserInterfaceInternationalisation). However, plugin developers and core coders need guidelines to avoid I18N issues in the future, and to ensure their plugins are widely used, not just by people who speak English as their first language.

I18N code has tended to regress over the last few years - partly due to lack of unit tests, but also due to new code being written that doesn't follow these guidelines.

Overview

Internationalization in Foswiki supports use of locales to ensure WikiWords and other page contents work with international characters (e.g. GrødWeb.BlåBærGrød), which means avoiding all use of [A-Z] or \w in regular expressions (except when you really mean A to Z of course, e.g. variable names that only use ASCII alphanumeric).

Fortunately, the changes required to your code are quite simple, and "all the hard work has been done" (to quote SteffenPoulsen!)

UserInterfaceInternationalisation guidelines are now included below - these show you how to make your user interface text ("message strings") internationalised, so that it can be translated as part of TranslationUserInterface efforts.

What if you don't bother?

Unless you are careful about using regular expressions ('regexes') that match alphabetic characters, your plugin or core module probably won't work for users in languages other than English.

Guidelines

Preparing a plugin or core module for I18N

There's one simple thing you absolutely must do in your plugin or core module to allow for I18N. You have to make sure the plugin can "see" the regular expressions set up in Foswiki.pm, and that it uses locale information.

You can do this by adding the following lines to the plugin, somewhere after the package declaration. This is from InterwikiPlugin, which is part of the core and a good example:

BEGIN {
    # Do a dynamic 'use locale' for this module
    if( $Foswiki::cfg{UseLocale} ) {
        require locale;
        import locale();
    }
}

Now everything is set up for using the regular expressions from Foswiki.pm. This means that you can just write $Foswiki::regex{wikiWordRegex} instead of figuring out your own matching rules for matching a WikiWord, across the many sorts of broken I18N locales and different Perl versions. A huge amount of testing and debugging is encapsulated for you, and you are also guaranteed that your regex code will work in future when Foswiki has UnicodeSupport!

Fixing regular expressions that match letters

The main thing to do when internationalizing plugins is to never use [A-Z] except when you really mean A to Z, ASCII only (e.g. in Macro names perhaps). Also, you should never, ever use \w or \b since this match 'words' based on locales and I18N characters, but don't work in many environments that have broken locales (including all Windows systems!) - instead, read on for how to write simple regexes that do the same thing portably across a wide variety of systems.

Whenever you see these patterns, ask yourself 'should this match accented characters as well as A-Z?' and 'is this really trying to match a page (topic) name or a web name'? (Page names are normally WikiWords, but not always.)

Once you have identified these problem areas, you need to use the regexes carefully crafted in the startup code in Foswiki.pm. These regexes work across Perl 5.005_03, Perl 5.6, and 5.8 or higher, including environments with very broken Perl locales (e.g. Windows), so they are your best option for cross-version and cross-platform I18N support. Code using these regexes will also work with future UnicodeSupport when that's implemented, despite the actual regexes changing dramatically.

Quickly find problematic regexes - NEW

To easily check core code or plugins for possible use of regexes that don't take account of I18N, just run the following one-liner under Linux or Cygwin - it will search all *.pm files including any subdirectories for use of [a-z], \w and \b, which are all potential issues unless you really mean A to Z without any international characters:

    find . -name '*.pm' | xargs egrep -i '\(\[a-z\]|\\w|\\b\)' >regex-warnings.txt

Some editors such as VimEditor and EmacsCPerlMode can take this output file and help you easily navigate to the right place to see the code in context.

Types of pre-defined regexes

The startup code in Foswiki.pm pre-defines a number of complete regexes as well as strings for use in building character classes as part regexes. Naming is used to distinguish these - examples are from the point of view of the calling code:

  • Complete regexes are compiled using qr/.../ and can be used as part of larger regexes, or as is. They are named fooRegex, e.g. $Foswiki::regex{wikiWordRegex} and are usually 'concept regexes' that match email addresses, WikiWords, etc.
  • Strings for use in character classes (i.e. within [....] in a regex) are just strings and must be used only in character classes. They are named foo, i.e. no Regex suffix - for example $Foswiki::regex{mixedAlphaNum}. On a Perl platform with broken I18N locales, this would be the string "a-zA-Z0-9" - note no square brackets!

Fixing core code regexes

This is similar to the plugin code below, but a bit less verbose as you have direct access to regexes without going through the plugin API. For example, you would change the following:

        if( $topic =~ /^\^\([\_\-a-zA-Z0-9\|]+\)\$$/ ) {

Into:

        if( $topic =~ /^\^\([\_\-$Foswiki::regex{mixedAlphaNum}\|]+\)\$$/ ) {

That's still not hugely readable, but more complex regexes can be greatly simplified. The following code is much more readable than using a-z etc, as well as working for I18N:

       $anchorName =~ s/($Foswiki::regex{wikiWordRegex})/_$1/go;

      # Prevent automatic WikiWord or CAPWORD linking in explicit links
      $link =~ s/(?<=[\s\(])($Foswiki::regex{wikiWordRegex}|[$Foswiki::regex{upperAlpha}])/$1/

Fixing plugin regexes

Plugin code is a bit more verbose than core code as it must first get the regexes via the Plugin API - here's an example adapted from Plugin:InterwikiPlugin:

    # Regexes for the Site:page format InterWiki reference
    my $mixedAlphaNum = Foswiki::Func::getRegularExpression('mixedAlphaNum');
    my $upperAlpha = Foswiki::Func::getRegularExpression('upperAlpha');
    $sitePattern    = "([$upperAlpha][$mixedAlphaNum]+)";
    $pagePattern    = "([${mixedAlphaNum}_\/][$mixedAlphaNum" . '\.\/\+\_\,\;\:\!\?\%\#-]+?)';

Regex efficiency

You may also want to use /o on your regex, or compile it using $fooPatternRegex = qr/$someRegexVar/, which should give better performance if used more than once, e.g. in loops or when running under mod_perl. See perldoc perlop and perldoc perlre for details, and don't use this if the regex (not the substitution right-hand-side) includes 'real' variables that vary between invocations of your code, e.g. user name.

International message strings

As Foswiki supports user interface internationalization, you should now avoid putting English language strings directly into Perl code. In addition, you should follow the main InternationalisationGuidelines to ensure that regular expressions and other code work well across multiple languages and locales (i.e. countries or regions).

The Foswiki::I18N class encapsulates message text internationalization, and the i18n field of the Foswiki session object is an instance of this class. Thus, wherever you might need to write an English string inside Perl code, you must write it wrapped in a call to the Foswiki:I18N::maketext method, like this:

# $session is an instance of Foswiki class
my $msg = $session->i18n->maketext("Access denied: you don't have access for editing this topic.");

You can also interpolate parameters into the text, and let the translator correctly translate messages, keeping a place for your parameters. Just write placeholders for the parameters numbered with the parameter order in the maketext call: [_1] for the first parameter, [_2] for the second, and so on. Then you can do things like this:

# $session is an instance of Foswiki class
my $msg = $session->i18n->maketext("This is topic [_1] on the web [_2].", $topic, $web);

Note that translators can change the order in which parameters appear in translated text (i.e. [_2] appearing first than [_1]), but they must keep the text's semantics, so that substituting the first parameter into [_1] and second parameter into [_2] says the same thing that is said in original, whatever order they are in.

See UserInterfaceInternationalisation for guidelines for writing message strings that can be translated.

Testing your fixes

Don't forget to test your code across a number of different I18N areas:
  • Page (WikiWord) and web names with I18N characters
  • Page contents with I18N characters - usually not a problem
  • Attachments with I18N characters in filename or the topic/web that contains attachment
  • Searching for I18N characters - especially if external programs used
  • Sorting to include I18N characters - whether using internal Perl code or external programs

If there is a valid locale that works within Perl, most things should 'just work' once you have fixed the regexes. However, on Windows and other platforms where locales are broken in Perl terms, you will only be able to do I18N for page contents and page/web names.

Example of code that needs fixing

Taking the original UpdateInfoPlugin as an example (now fixed...) - this uses \w to match a WikiWord (which is actually incorrect anyway, as it will match non-System.WikiWords!), when it should use the relevant WikiWord regex via the plugin API. You frequently find that you end up fixing other bugs when adding I18N support, because it forces you to look closely at the regexes.


Discussion

Any comments on how the I18N documentation is written or could be improved
Topic revision: r6 - 04 Oct 2011, CrawfordCurrie
The copyright of the content on this website is held by the contributing authors, except where stated elsewhere. See Copyright Statement. Creative Commons License    Legal Imprint    Privacy Policy