Add Book to My BookshelfPurchase This Book Online

Chapter 16 - Miscellaneous Routines

UNIX Systems Programming for SVR4
David A. Curry
 Copyright © 1996 O'Reilly & Associates, Inc.

Internationalization
For years, UNIX used the ASCII character set. ASCII, being the American Standard Code for Information Interchange, works great in the United States. But in England, where the monetary symbol is '£,' a non-ASCII character, a problem arises. In countries that use diacritical marks with their letters, e.g., â, ç, ì, õ, and ü, the problem is even worse. And in countries like Japan, where the character set is not even remotely Latin in origin, ASCII is completely hopeless.
In recent years, as UNIX has spread throughout the world, so has interest in internationalizing it. All programs should handle the local country's character set, whatever that is. Programs that print dates and times should print them in the commonly accepted format of the local country. Programs that print formatted numbers should use the proper character to mark the decimal point, and so forth.
Internationalization is too complex a topic to cover in this book. Instead, we present only a few of the many functions available for internationalization.
Programs using the functions described in this section must be linked with the -lintl library.
Defining the Locale
A locale defines the characteristics of the environment, from an internationalization standpoint, that a program is operating in. The “UNIX” locale is named “C.” Other locales generally use a two-character name, usually the ISO standard two-letter abbreviation for the country name. For example, “de” is the German locale, “fr” is the French locale, and “ja” is the Japanese locale.
The setlocale function sets a program's locale for any of several different categories:
    #include <locale.h>
    char *setlocale(int category, const char *locale);
The locale parameter contains the name of the locale; this will be used by the internationalization functions to look at various databases contained in the subdirectory of the same name in /usr/lib/locale. If locale contains the empty string, the value will be taken from environment variables. If locale is NULL, the current locale will be returned and no changes made.
The category parameter must be one of the following:
LC_CTYPE
Affects the behavior of the character type functions such as isdigit and tolower.
LC_NUMERIC
Affects the decimal point character and the thousands digit separator character for formatted input/output functions (scanf, printf, etc.) and string conversion functions (strtol, etc.).
LC_TIME
Affects the date and time formats delivered by ascftime, cftime, getdate, and strftime.
LC_COLLATE
Affects the sort order produced by strcoll and strxfrm (see below).
LC_MONETARY
Affects the monetary formatting information returns by localeconv (see below).
LC_MESSAGES
Affects the behavior of dgettext, gettext, and gettxt (not discussed in this book).
LC_ALL
A shorthand way to specify all of the above categories.
If setlocale succeeds, it returns locale. If it fails, it returns NULL.
Formatting Numbers
A number of factors affect formatting numbers in different countries. Aside from the obvious differences in monetary symbols, there are also differences in the character used for a decimal point (some countries use a period, others use a comma), the character used to separate thousands groups (some countries use a comma, others use a period), and so forth.
The localeconv function returns information about how to format numbers in the program's current locale:
    #include <locale.h>
    struct lconv *localeconv(void);
The function returns a pointer to a structure of type struct lconv:
    struct  lconv   {
        char    *decimal_point;
        char    *thousands_sep;
        char    *grouping;
        char    *int_curr_symbol;
        char    *currency_symbol;
        char    *mon_decimal_point;
        char    *mon_thousands_sep;
        char    *mon_grouping;
        char    *positive_sign;
        char    *negative_sign;
        char     int_frac_digits;
        char     frac_digits;
        char     p_cs_precedes;
        char     p_sep_by_space;
        char     n_cs_precedes;
        char     n_sep_by_space;
        char     p_sign_posn;
        char     n_sign_posn;
    };
The fields of this structure are:
decimal_point
The decimal point character used to format non-monetary quantities.
thousands_sep
The character used to separate groups of digits to the left of the decimal point in non-monetary quantities.
grouping
A string in which each byte is taken as an integer that indicates the number of digits comprising the current group in a formatted non-monetary quantity. Each integer is interpreted according to the following:
CHAR_MAX
No further grouping should be performed.
0
The previous element is to be used repeatedly for the remainder of the digits.
other
The value is the number of digits that comprise the current group. The next element is examined to determine the size of the next group of digits to the left of the current group.
int_curr_symbol
The international currency symbol applicable to the current locale.
currency_symbol
The local currency symbol applicable to the current locale.
mon_decimal_point
The decimal point character to be used in formatting monetary quantities.
mon_grouping
A string in which each byte is taken as an integer that indicates the number of digits comprising the current group in a formatted monetary quantity. Each integer is interpreted according to the rules described above.
positive_sign
The string used to indicate a non-negative formatted monetary quantity.
negative_sign
The string used to indicate a negative formatted monetary quantity.
int_frac_digits
The number of decimal places to the right of the decimal to display in internationally formatted monetary quantities.
frac_digits
The number of decimal places to the right of the decimal to display in locally formatted monetary quantities.
p_cs_precedes
Set to 1 or 0 to indicate whether the currency symbol precedes (1) or succeeds (0) the value for non-negative formatted monetary quantities.
p_sep_by_space
Set to 1 or 0 to indicate whether the currency symbol is (1) or is not (0) separated by a space from the value for a non-negative formatted monetary quantity.
n_cs_precedes
Set to 1 or 0 to indicate whether the currency symbol precedes (1) or succeeds (0) the value for negative formatted monetary quantities.
n_sep_by_space
Set to 1 or 0 to indicate whether the currency symbol is (1) or is not (0) separated by a space from the value for a negative formatted monetary quantity.
p_sign_posn
Indicates how to position the positive sign for a non-negative formatted monetary quantity, as follows:
0
Parentheses surround the quantity and currency symbol.
1
The sign string precedes the quantity and currency symbol.
2
The sign string follows the quantity and currency symbol.
3
The sign string immediately precedes the currency symbol.
4
The sign string immediately follows the currency symbol.
n_sign_posn
Indicates the positioning of the negative sign for a negative formatted monetary quantity. The possible values are as described above for p_sign_posn.
Collating Sequences
Functions such as strcmp compare strings based on the ASCII collating sequence, which in general is the same as alphabetical order. However, these functions do not work properly for character sets other than ASCII. Thus, when working in an international environment, qsort cannot be used with strcmp to sort strings into the proper order.
The strcoll and strxfrm functions can be used instead to make these comparisons:
    #include <string.h>
    int strcoll(const char *s1, const char *s2);
    size_t strxfrm(char *dst, const char *src, size_t n);
The strcoll function compares strings s1 and s2. If s1 is less than s2, the function returns less than 0; if s1 is greater, it returns 0, and if s1 is greater than s2, it returns a greater than 0 value. The strings are interpreted in the program's locale for the LC_COLLATE category.
The strxfrm function transforms the string src, placing the result in dst. If strcmp is applied to two transformed strings, it will return the same result as if strcoll had been applied to the original strings. No more than n bytes will be placed into dst, including the terminating null character. If dst is null and n is 0, strxfrm will return the number of bytes required to store the transformed string. The length of the transformed string is returned by strxfrm; if this is greater than n, the contents of dst are undefined.
The strcoll function calls strxfrm on s1 and s2 and then returns the result of comparing them with strcmp. If a large number of strings is to be compared against a single string for a match, it is more efficient to call strxfrm and strcmp yourself.
As mentioned previously, there are many more internationalization-support functions. Functions and libraries are also available to help the programmer implement multilingual error messages, handle multi-byte characters (for languages such as Japanese), and so forth. For a complete discussion of the issues involved in internationalization and the functions provided to work around them, consult one of the several books devoted to the topic.

Previous SectionNext Section
Books24x7.com, Inc © 2000 –  Feedback