Omnis Technical Note TNSQ0013 September 2006

Mapping Character Sets

For Omnis Studio 4.1 or above.
By Dr Michael Hufschmidt

Problem
Since the days of the Tower of Babel there have been thousands of different languages in the world. As long as you stay within your native country or culture this isn't usually a problem. But as soon as you want to communicate with foreigners, you either have to use a common language or you need an interpreter. And to make it worse, the multitude of languages comes with a multitude of different character sets.

The same is true for programming: As long as you develop stand-alone programs you won't find any problems, but when you develop client/server solutions that communicate with a remote database, either all clients must speak the language of the database server or you will need an interpreter.

The most common problems are concerning different character sets: Letters are in most cases saved in one byte. Therefore, for a program there are 256 different characters available (code 000 to 255). In ASCII code only the first 128 characters are fixed, whereas National special characters have got an ASCII code over 128. There are many different assignments of these codes over 128 (= character set tables), for example, refer to:

www.iana.org/assignments/character-sets

The ultimate solution to any problems regarding character sets is to use Unicode. In Unicode each character is saved in up to 4 bytes, so that there are 4,294,967,296 different letters/characters available. For further details about Unicode and the character sets available, refer to:

www.unicode.org/charts

Since December 2005, Omnis Studio version 4.1 is available in a Unicode version so that Omnis can handle all characters in all languages correctly. When you are using the Non-Unicode version of Omnis Studio and want to store text with national special characters in a database you have to make sure that the clients and the server are using the same character set table. Otherwise national characters, for example, German umlauts such as ÄÖÜ and äöü won't be displayed correctly.

Omnis as an interpreter
But what can you do in the Non-Unicode version of Omnis-Studio and you need to communicate with a database that has set up a different character set? Don't worry, there is help available: The Data Access Modules (DAMs) in Omnis Studio can re-code the character set in a way that it fits to the character set of the server. For this the appropriate mapping tables have to be created and must be assigned as a property to the session object being used:

Do mySessionObj.$charmap.$assign(kSessionCharMapTable)
Do mySessionObj.$maptable.$assign('myMapTable')

There are always two map tables required: one for mapping the characters in communications from the client to the server (file name myMapTable.OUT), and a reciprocal table for the character mapping in communications from the server to the Omnis client (file name myMapTable.IN). Omnis provides you with a tool to create these tables, called the Charmap utility. You can find a detailed description of this mapping procedure in the "Omnis Programming" manual (edition November 2005), Chapter 9 "SQL Programming", section "Character Mapping", page 375. This manual is included as a PDF file on the Omnis product DVD.

A special case: Oracle
When using Oracle, the Omnis DAM is not communicating directly with the database server. In fact, on the Omnis client you have to have the Oracle clientware installed. The Omnis DAM communicates with the local Oracle clientware and this in turn communicates with the Oracle server.
The mapping of the character sets as described above can be done either by Omnis or in the Oracle clientware. The Oracle clientware however, stores the character set used either in an environment variable NLS_LANG and/or in the Windows Registry in the node:

HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOME0

The value of the environment variable will be preferred, and could contain values such as:

GERMAN_GERMANY.WE8ISO8859P1
or
GERMAN_GERMANY.WE8MSWIN1252

When you first install an Oracle server, by default, the first character set is ISO-8859-1 (Alias Latin-1), and when you install an Oracle client under Windows the second character set is usually set up as Windows-1252. Therefore, the character set problems as described here will typically happen.
Please note that you must map the character sets either by the Oracle client or by Omnis, but not both! Otherwise the special characters would be mapped twice, and that of course will lead to the unexpected results.

A special case: MySQL
With a normal installation of a MySQL server, by default, the character set is ISO-8859-1, like Oracle, but you can set up the character set each time you create a new table on the database, such as:

CREATE TABLE myTable ( ... ) ENGINE=MyISAM DEFAULT CHARSET=latin1;

With the MySQL command

SHOW CHARACTER SET;

You will receive a list of all character sets supported by any particular server, so that you can edit the SQL string to create a server table if necessary as follows:

CREATE TABLE myTable ( ... )ENGINE=MyISAM DEFAULT CHARSET=cp1250;

When MySQL is using the default character set ISO-8859-1 you must set the property

Do MySQL_SessionObj.$charmap.$assign(kSessionCharMapNative)

in the Omnis session object. Then .IN and .OUT map tables are not required anymore.