[Previous] [Contents] [Index] [Next]

Unicode Multilingual Support

Photon is designed to handle international characters. Following the Unicode Standard (ISO/IEC 10646), Photon lets you create applications that can easily support the world's major languages and scripts.

Unicode is modeled on the ASCII character set, but uses a 16-bit encoding to support full multilingual text. There's no need for escape sequences or control codes when specifying any character in any language. Note that Unicode encoding conveniently treats all characters - whether alphabetic, ideographs, or symbols - in exactly the same way.

Photon supports other character set encodings, and provides library functions for translating to and from Unicode. PhAB's language database support makes it easy to change the language in your application. Add-on products are also available to support fonts and input for languages such as Japanese.

This chapter includes:

UTF-8 encoding

Formerly known as UTF-2, the UTF-8 (for "8-bit form") transformation format is designed to address the use of Unicode character data in 8-bit UNIX environments.

Here are some of the main features of UTF-8:

Character set translation files

The Photon character set translation library is intended to provide support routines for converting to and from the native Unicode/ UTF-8 character set.

The configuration file /usr/photon/translations/charsets details the character sets for which conversion support is installed. Each entry in this file describes a target character set, specifying:

Here's a sample entry for ISO 8859-1:

[ISO_8859-1:1987]
MIBenum     = 4
Alias       = iso-ir-100,ISO_8859-1,ISO-8859-1,latin1,csISOLatin1
Method      = 8bit
Table       = 8859-1.tab
Description = Western (Latin-1)

New entries may be added to the file as required, provided that there's code support for the appropriate method. New entries for an existing method will require a new translation data file. Valid translation methods are:

7bit
Simple table lookup with a domain of 0x00 - 0x7F; the data file is a (binary) table of 128 wide-character codes.
8bit
Simple table lookup with a domain of 0x00 - 0xFF; the data file is a (binary) table of 256 wide-character codes.
sjis
Shift-JIS (8-bit) encoding; the data file is the 94x94 wide-character grid.
euc
Extended-unix-code (8-bit) encoding; the data file is the 94x94 wide-character grid.
internal
UTF-8 character copy scheme; no external resources are required.

Photon ships with a set of standard translations for common character sets, including the ISO 8859 alphabets, many DOS/IBM/Windows code pages, and a number of Japanese encodings (SJIS and EUC).

Keyboard tables

Photon ships with numerous keyboard tables (see /usr/photon/keyboard). By default Photon assumes you're using the 101-key US keyboard. If you'd like to use a different keyboard, specify the name of the corresponding keyboard table using the KBD environment variable.

For example, adding the following line to your sysinit.node file selects the German keyboard:

export KBD=de_DE_102.kbd


Note: You can create custom keyboard tables for language files that aren't supplied.

Creating new keyboard tables

Binary keyboard files (*.kbd) are stored in the /etc/config/kbd directory. These files contain the capability definitions associated with a single keyboard type. The corresponding source definitions for all Photon-supported keyboard types are called keyboard definition files (*.kdef). They are contained in /usr/photon/keyboard.

To view or change the information in an existing keyboard (*.kbd) file, you must first convert it to text (*.kdef) using the kbcvt utility. To use a modified keyboard definition file, you'd compile it using the mkkbd utility and place the result in /etc/config/kbd.

To create a completely new keyboard table, start with a keyboard definition file (*.kdef) having most of the capabilities you require. Make a copy of it, modify the copy using any text editor, compile it using mkkbd, and place the result in /etc/config/kbd.

The sample.kdef file in the /usr/photon/keyboard directory contains comments describing the syntax of a keyboard definition.

The mkkbd utility optimizes the size of your keyboard definitions by removing unused key types, symbols, and compose sequences. An unused key type is one that doesn't have any keys assigned to it. An unused compose sequence is one that can't be typed on the keyboard. An unused symbol is an entry in the symbol table of a key type that is never referred to. Here's an example:

KeyType "MyKeyType" {
    Unmodified Sym #1
    <Shift> Sym #3
    }

Each key belonging to the type MyKeyType must specify three symbols, but symbol 2 will never be used. In this case, mkkbd removes the unused symbols from the key definitions and adjusts the key type accordingly.

Creating a keyboard table from a text file

The mkkbd utility compiles a text file containing keyboard definitions (*.kdef) and is capable of creating a binary keyboard table (*.kbd) file from each definition.

The input file is taken from standard input (stdin) or the file you specify through the -i option. If the input defines multiple keyboards, you can select the ones that will be converted to binary by matching keyboard names with one or more character patterns.


Caution: Always use single or double quotation marks when specifying a name pattern (e.g. "fr_*") to avoid shell expansion.

If you specify a directory for output files (with the -d or -k option), the binary files will be named according to the names given in the text file. Unnamed keyboards aren't converted. If you don't specify a directory, the first keyboard that matches a given pattern is converted and the output binary file is written to standard output (stdout). In this case, we recommend that you redirect the output to a file.

An unnamed keyboard is considered to match a pattern only if you don't specify any patterns - thus, it's possible to convert an unnamed keyboard to binary only if it's the first keyboard defined in your file.

The mkkbd utility will inform you about syntax errors and certain conflicts in your keyboard definition.

Converting a keyboard table into a text file

The kbcvt utility converts one or more binary keyboard tables (*.kbd) to text. If the binary file you want to convert is in the /etc/config/kbd directory and has a .kbd extension, you don't have to specify the filename extension on the command line. The output text is written to standard output (stdout).

Selecting an international keyboard

Keyboard tables are named according to the following convention:

la_CO_num.kbd

where:

la
A two-letter language code such as de for Deutsch, en for English, fr for Français, and so on.
CO
The international two-letter code for the country, such as CA for Canada, GB for Great Britain, and US for the United States.
num
The number of keys on the keyboard - usually 101 or 102.

The available tables are:

Language Table
Belgian French fr_BE_102.kbd
Canadian English en_CA_101.kbd
Canadian French fr_CA_102.kbd
Danish da_DK_102.kbd
Dutch nl_NL_102.kbd
French fr_FR_102.kbd
German de_DE_102.kbd
Italian it_IT_102.kbd
Japanese ja_JP_106.kbd
Norwegian no_NO_102.kbd
Polish pl_PL_102.kbd
Portuguese pt_PT_102.kbd
Swiss French fr_CH_102.kbd
Swiss German de_CH_102.kbd
Spanish es_ES_102.kbd
Swedish se_SE_102.kbd
UK English en_GB_102.kbd
US English en_US_101.kbd

Composing international characters

Suppose you need to include a character that isn't in the standard ASCII table, for example è. You can do this by using a compose sequence, a series of key presses. All compose sequences used by Photon keyboards are contained in the /usr/photon/keyboard/compose.inc file.

The compose.inc file contains entries such as the following:

Sym [Agrave] Keys [combining_grave] 'A'
Sym [agrave] Keys [combining_grave] 'a'
Sym [Agrave] Keys '`' 'A'
Sym [Aacute] Keys [acute] 'A'
Sym [Aacute] Keys [combining_acute] 'A'
Sym [Aacute] Keys ''' 'A'
Sym [Acircumflex] Keys '^' 'A'
Sym [Acircumflex] Keys [combining_circumflex] 'A'

The name of the symbol combines the letter and the accent or diacritical mark. For example, Agrave is a capital A with the grave accent, and agrave is a small a with the grave accent.

In our example, we want to insert the è character. So we need to look for the compose sequence called egrave in the compose.inc file:

Sym [egrave] Keys [combining_grave] 'e'
Sym [egrave] Keys '`' 'e'

There are two ways to get this character. The first entry may be interpreted as:

The second entry may be interpreted as:

PhAB multilingual applications

PhAB applications can be set up to operate in a language other than English. Using an appropriate translation file (if any are available for your system; see the /usr/photon/translations directory), you can set the ABLANG and ABLPATH environment variables in the sysinit.node file to select a language and provide the path to the translations directory.

For example, the following two lines would set the language to German, assuming the supporting translation file is available in the translations directory:

export ABLANG=de_DE
export ABLPATH=/usr/photon/translations

Note: The values you can enter for the ABLANG environment variable match the two-letter language (la) and country (CO) code conventions of the keyboard drivers Photon currently supports. For a complete list of supported keyboard table names, see "Selecting an International keyboard" in this chapter.


[Previous] [Contents] [Index] [Next]