Unicode Multilingual Support

Photon is designed to handle international characters. Following the Unicode Standard (ISO/IEC 10646), Photon lets you create applications that can easily support the world's major languages and scripts.

Unicode is modeled on the ASCII character set, but uses a 16-bit encoding to support full multilingual text. There's no need for escape sequences or control codes when specifying any character in any language. Note that Unicode encoding conveniently treats all characters - whether alphabetic, ideographs, or symbols - in exactly the same way.

Photon supports other character set encodings, and provides library functions for translating to and from Unicode. PhAB's language database support makes it easy to change the language in your application. Add-on products are also available to support fonts and input for languages such as Japanese.

This chapter includes:

UTF-8 encoding
Character set translation files
Keyboard tables
PhAB multilingual applications

UTF-8 encoding

Formerly known as UTF-2, the UTF-8 (for "8-bit form") transformation format is designed to address the use of Unicode character data in 8-bit UNIX environments.

Here are some of the main features of UTF-8:

Unicode characters from U+0000 to U+007E (ASCII set) map to UTF-8 bytes 00 to 7E (ASCII values).
ASCII values don't otherwise occur in a UTF-8 transformation, giving complete compatibility with historical filesystems that parse for ASCII bytes.
UTF-8 simplifies conversions to and from Unicode text.
The first byte indicates the number of bytes to follow in a multibyte sequence, allowing for efficient forward parsing.
Finding the start of a character from an arbitrary location in a byte stream is efficient, because you need to search at most four bytes backwards to find an easily recognizable initial byte. For example:
```
isInitialByte = ((byte & 0xC0) != 0x80);
```
UTF-8 is reasonably compact in terms of the number of bytes used for encoding.

Character set translation files

The Photon character set translation library is intended to provide support routines for converting to and from the native Unicode/ UTF-8 character set.

The configuration file /usr/photon/translations/charsets details the character sets for which conversion support is installed. Each entry in this file describes a target character set, specifying:

the standard name for this translation
a list of valid aliases for this translation
a textual description
a translation method
a unique identifier code (the MIB enumeration code)
a data file holding translation table data.

Here's a sample entry for ISO 8859-1:

[ISO_8859-1:1987]
MIBenum     = 4
Alias       = iso-ir-100,ISO_8859-1,ISO-8859-1,latin1,csISOLatin1
Method      = 8bit
Table       = 8859-1.tab
Description = Western (Latin-1)

New entries may be added to the file as required, provided that there's code support for the appropriate method. New entries for an existing method will require a new translation data file. Valid translation methods are:

7bit: Simple table lookup with a domain of 0x00 - 0x7F; the data file is a (binary) table of 128 wide-character codes.
8bit: Simple table lookup with a domain of 0x00 - 0xFF; the data file is a (binary) table of 256 wide-character codes.
sjis: Shift-JIS (8-bit) encoding; the data file is the 94x94 wide-character grid.
euc: Extended-unix-code (8-bit) encoding; the data file is the 94x94 wide-character grid.
internal: UTF-8 character copy scheme; no external resources are required.

Photon ships with a set of standard translations for common character sets, including the ISO 8859 alphabets, many DOS/IBM/Windows code pages, and a number of Japanese encodings (SJIS and EUC).

Keyboard tables

Photon ships with numerous keyboard tables (see /usr/photon/keyboard). By default Photon assumes you're using the 101-key US keyboard. If you'd like to use a different keyboard, specify the name of the corresponding keyboard table using the KBD environment variable.

For example, adding the following line to your sysinit.node file selects the German keyboard:

export KBD=de_DE_102.kbd

You can create custom keyboard tables for language files that aren't supplied.

Creating new keyboard tables

Binary keyboard files (*.kbd) are stored in the /etc/config/kbd directory. These files contain the capability definitions associated with a single keyboard type. The corresponding source definitions for all Photon-supported keyboard types are called keyboard definition files (*.kdef). They are contained in /usr/photon/keyboard.

To view or change the information in an existing keyboard (*.kbd) file, you must first convert it to text (*.kdef) using the kbcvt utility. To use a modified keyboard definition file, you'd compile it using the mkkbd utility and place the result in /etc/config/kbd.

To create a completely new keyboard table, start with a keyboard definition file (*.kdef) having most of the capabilities you require. Make a copy of it, modify the copy using any text editor, compile it using mkkbd, and place the result in /etc/config/kbd.

The sample.kdef file in the /usr/photon/keyboard directory contains comments describing the syntax of a keyboard definition.

The mkkbd utility optimizes the size of your keyboard definitions by removing unused key types, symbols, and compose sequences. An unused key type is one that doesn't have any keys assigned to it. An unused compose sequence is one that can't be typed on the keyboard. An unused symbol is an entry in the symbol table of a key type that is never referred to. Here's an example:

KeyType "MyKeyType" {
    Unmodified Sym #1
    <Shift> Sym #3
    }

Each key belonging to the type MyKeyType must specify three symbols, but symbol 2 will never be used. In this case, mkkbd removes the unused symbols from the key definitions and adjusts the key type accordingly.

Creating a keyboard table from a text file

The mkkbd utility compiles a text file containing keyboard definitions (*.kdef) and is capable of creating a binary keyboard table (*.kbd) file from each definition.

The input file is taken from standard input (stdin) or the file you specify through the -i option. If the input defines multiple keyboards, you can select the ones that will be converted to binary by matching keyboard names with one or more character patterns.

Always use single or double quotation marks when specifying a name pattern (e.g. "fr_*") to avoid shell expansion.

If you specify a directory for output files (with the -d or -k option), the binary files will be named according to the names given in the text file. Unnamed keyboards aren't converted. If you don't specify a directory, the first keyboard that matches a given pattern is converted and the output binary file is written to standard output (stdout). In this case, we recommend that you redirect the output to a file.

An unnamed keyboard is considered to match a pattern only if you don't specify any patterns - thus, it's possible to convert an unnamed keyboard to binary only if it's the first keyboard defined in your file.

The mkkbd utility will inform you about syntax errors and certain conflicts in your keyboard definition.

Converting a keyboard table into a text file

The kbcvt utility converts one or more binary keyboard tables (*.kbd) to text. If the binary file you want to convert is in the /etc/config/kbd directory and has a .kbd extension, you don't have to specify the filename extension on the command line. The output text is written to standard output (stdout).

Selecting an international keyboard

Keyboard tables are named according to the following convention:

la_CO_num.kbd

where:

la: A two-letter language code such as de for Deutsch, en for English, fr for Français, and so on.
CO: The international two-letter code for the country, such as CA for Canada, GB for Great Britain, and US for the United States.
num: The number of keys on the keyboard - usually 101 or 102.

The available tables are:

Language	Table
Belgian French	`fr_BE_102.kbd`
Canadian English	`en_CA_101.kbd`
Canadian French	`fr_CA_102.kbd`
Danish	`da_DK_102.kbd`
Dutch	`nl_NL_102.kbd`
French	`fr_FR_102.kbd`
German	`de_DE_102.kbd`
Italian	`it_IT_102.kbd`
Japanese	`ja_JP_106.kbd`
Norwegian	`no_NO_102.kbd`
Polish	`pl_PL_102.kbd`
Portuguese	`pt_PT_102.kbd`
Swiss French	`fr_CH_102.kbd`
Swiss German	`de_CH_102.kbd`
Spanish	`es_ES_102.kbd`
Swedish	`se_SE_102.kbd`
UK English	`en_GB_102.kbd`
US English	`en_US_101.kbd`

Composing international characters

Suppose you need to include a character that isn't in the standard ASCII table, for example è. You can do this by using a compose sequence, a series of key presses. All compose sequences used by Photon keyboards are contained in the /usr/photon/keyboard/compose.inc file.

The compose.inc file contains entries such as the following:

Sym [Agrave] Keys [combining_grave] 'A'
Sym [agrave] Keys [combining_grave] 'a'
Sym [Agrave] Keys '`' 'A'
Sym [Aacute] Keys [acute] 'A'
Sym [Aacute] Keys [combining_acute] 'A'
Sym [Aacute] Keys ''' 'A'
Sym [Acircumflex] Keys '^' 'A'
Sym [Acircumflex] Keys [combining_circumflex] 'A'

The name of the symbol combines the letter and the accent or diacritical mark. For example, Agrave is a capital A with the grave accent, and agrave is a small a with the grave accent.

In our example, we want to insert the è character. So we need to look for the compose sequence called egrave in the compose.inc file:

Sym [egrave] Keys [combining_grave] 'e'
Sym [egrave] Keys '`' 'e'

There are two ways to get this character. The first entry may be interpreted as:

If your keyboard has a combining_grave key (standard keyboards don't) and an e key, you can compose an è character by pressing Alt (or whichever key is defined as the "Compose" key), followed by the combining_grave key, followed by e.
If your keyboard has a dead combining_grave key (a key that doesn't produce a character when pressed on its own) and an e key, you can compose an è character by pressing the dead combining_grave key followed by e.

The second entry may be interpreted as:

If your keyboard has a ` key and an e key, you can compose an è character by pressing Alt (or whichever key is defined as the "Compose" key), followed by the ` key, followed by e.
If your keyboard has a dead ` key and an e key, you can compose an è character by pressing the dead ` key followed by e.

PhAB multilingual applications

PhAB applications can be set up to operate in a language other than English. Using an appropriate translation file (if any are available for your system; see the /usr/photon/translations directory), you can set the ABLANG and ABLPATH environment variables in the sysinit.node file to select a language and provide the path to the translations directory.

For example, the following two lines would set the language to German, assuming the supporting translation file is available in the translations directory:

export ABLANG=de_DE
export ABLPATH=/usr/photon/translations

The values you can enter for the ABLANG environment variable match the two-letter language (la) and country (CO) code conventions of the keyboard drivers Photon currently supports. For a complete list of supported keyboard table names, see "Selecting an International keyboard" in this chapter.