Omnis Technical Note TNXM0002 June 2009

Creating Unicode External Components

For Omnis Studio Unicode version
By Gary Ashford

Introduction
This Technote assumes familiarity with the Omnis External Component SDK and example projects- which can be downloaded from this website, and Microsoft Developer Studio/ Visual Studio which is the Windows application used to compile and link Omnis external components.

The following discusses the issues involved when building components compatible with the Unicode version of Omnis Studio. The screenshot shown is from a sample project which can be downloaded via link at the end of this page.

Unicode or Non-Unicode?
If you are designing a component to be used with both the Unicode and Non-Unicode versions of Studio (Omnis Studio 4.3.1 for example), you should create separate Unicode targets, i.e. Unicode Debug and Unicode Release to complement your existing Debug and Release targets.
In the Unicode targets, the following additional Preprocessor Definitions are required:

isunicode Requirement of the Omnis external component interface
UNICODE Requirement of Visual Studio to enable wide character support
_UNICODE Required by the __T() macro (tchar.h) when creating Unicode literal strings

In this way, you can make use of conditional compilation statements (e.g. #ifdef isunicode) to handle Unicode-specific code whilst retaining the same source files for use by both the Unicode and non-Unicode targets.
If you will not require the non-Unicode targets (for instance if designing components for Studio 5 and later) the Debug and Release targets can subsequently be discarded. For a full list of preprocessor definitions used by Omnis components, please refer to the sample project properties.
It should be noted that since Studio 5 supports Unicode only, Unicode components are required even if the component itself will ultimately only be handling non-Unicode data.

When building Unicode targets you should ensure that the compiler correctly recognises the wchar_t data type.
With Visual Studio 2008, the Project Settings->C++->Language>"Treat wchar_t as Built-in Type" should be set to:
No (/Zc:wchar_t-). With Xcode, the following flag should be added to the "Other C++ Flags": -fshort-wchar

Handling Unicode Character data
Omnis Studio uses the Utf32 encoding for all character data exchanged with external components although there are several helper functions, data types and classes which provide compatibility and conversion between other encodings, notably; Utf8 and Utf16. These are discussed below.

qchar datatype When isunicode is defined, qchar is defined as unsigned long (4 bytes/Utf32). For non-Unicode targets, qchar defaults to unsigned char (1 byte). Most ECO.. and GDI.. component interface methods take qchar arguments as parameters.
qoschar datatype This datatype corresponds to the encoding used by the operating system. qoschar is defined as unsigned short (2 bytes/Utf16) under Windows and MacOSX, but as unsigned char (1 byte/Utf8) under Linux. For non-Unicode targets, qoschar defaults to char. Some external component library classes (notably strxxx) take qoschar arguments in their constructors. See QTEXT()
qbyte datatype qbyte is always defined as unsigned char and is used specifically when handling binary and one-byte-per-character data (i.e. can be used to store Utf8 data if required).
QCHARLEN() & QOSCHARLEN() macros These provide a simple conversion from a supplied byte length to the corresponding qchar or qoschar character length respectively. It should be noted that they do not operate on strings or arrays of characters directly. They simply divide the supplied parameter by 4 in the case of QCHARLEN() or 2 (or 1) in the case of QOSCHARLEN().
QBYTELEN() & QOSBYTELEN() macros These provide a simple conversion from a supplied character length to the corresponding Utf32 or Utf16/Utf8 byte length respectively. It should be noted that they do not operate on strings or arrays of characters directly. They simply multiply the supplied parameter by 4 in the case of QCHARLEN() or 2 (or 1) in the case of QOSCHARLEN().
QTEXT() macro This is useful for creating and supplying literal string values inside components. When _UNICODE is defined, QTEXT() appends the L ## escape sequence onto the supplied text. This instructs the compiler to treat the resulting text as a string of qoschars. QTEXT() can be used anywhere where a qoschar* argument is required, for example:
str255 myString( QTEXT("Default Value") ); //call the qoschar* constructor

CHRconvFromOs class
(Defined in chrbasic.he)
This C++ class converts a string of qoschars to qchars. Use the constructor methods to assign the string of qoschars and the dataPtr() & len() methods to extract the Utf32 encoded string. See the chrbasic.he header file for details on other constructor types and utility methods provided by this class. Example:
qchar pDest[255];
qoschar *pSource = QTEXT("Column Headings");

CHRconvFromOs cfoString(pSource);
OMstrcpy(&pDest[0], cfoString.dataPtr()); //convert to a string of qchars

CHRconvToOs class This class converts a string of qchars to a string of qoschars and is called in a similar fashion to the CHRconvFromOs class, for example:

qchar *pString; //contains a null terminated string of qchars

CHRconvToOs ctoString(pString);
qoschar *osString = ctoString.dataPtr(); //convert to a string of qoschars


CHRconvToAscii class This class converts a string of qchars (contained inside a strxxx class) to a string of ASCII bytes. It assumes that the supplied string contains ASCII compatible characters.
CHRunicode class This class contains utility functions for converting between various Unicode encodings. There are functions to convert to and from Ascii bytes and Utf8 as well as a function which gives the number of Utf8 bytes correspoding to any given Utf32 character.
CHRconvToUtf16 class This class converts a string of Utf8 characters to Utf16.
CHRconvFromUtf16 class This class converts a string of Utf16 characters to Utf8.
CHRconvToBytes class This class converts a string of qchars to Utf8.
CHRconvFromBytes class This class converts a string of Utf8 characters to qchars (Utf32).
CHRconvFromLatin1ApiBytes class This class converts a string of ASCII bytes to qchars. Any extended characters encountered are assumed to be from the Latin1(Windows) code page.
CHRconvToLatin1ApiBytes class This class converts a string of qchars to ASCII bytes. Conversion to extended characters found in the Latin1 code page is also supported.
OMstr... functions
(Defined in omstring.h)
There are a number of Omnis string functions to mirror the standard C string functions. These operate on strings of qchars and are prefixed to distinguish them from their ASCII counterparts. For example: OMstrcpy, OMstrlen, OMstrncat & OMstrtok.
There are also functions to convert between character strings and integers: OMlongToString and OMstrtoul.

Setting Component Property Values
When handling the ECM_SETPROPERTY message sent to your components attributeSupport method, use one of the EXTfldval.getChar() methods as normal when referring to character data. For example:

str255 mMyText;
mMyText = fval.getChar(); //to fetch property value directly into a str255 class member

or
qchar buffer[255]; qlong len;
fval.getChar(255, &buffer[0], len ); //to fetch property value as a string of Utf32 characters

Getting Component Property Values
When handling the ECM_GETPROPERTY message sent to your components attributeSupport method, use one of the EXTfldval.setChar() methods as normal when referring to character data. For example:

fval.setChar(mMyText); //where mMyText is defined as str255
or
fval.setChar(&buffer[0], len); //where buffer is an array of qchars and len equals the number of characters

About Omnis Fonts
In order to display Unicode data in an Omnis field or in the method editor, an appropriate font must be chosen which supports the required subset of Unicode. Unicode Fonts typically support a range of Unicode characters, displaying characters outside this range as square blocks.

For example, to display the Chinese text shown in the sample library, it was necessary to download and install the "HDZB_5.TTF" TrueType font and add this to Omnis using the Windows #WIWFONTS System Class. The fontname field also has this font assigned to it so that the Chinese font name can be displayed. This font also supports Western characters, so is capable of displaying the other Windows font names.

Similarly, to see Chinese text in the Omnis method editor, it is necessary to assign the "HDZB" font as the font used to show Omnis code, as shown.

Sample Project and Omnis Library
The following sample project is extended from the Generic4 example project supplied with the Omnis Component SDK and is supplied with a Visual Studio 2008 project file.
An Omnis Studio 5.0 library is also provided which loads this component (once built/copied into the Omnis\xcomp folder).
The sample component demonstrates getting/setting of a Unicode text property, displaying Unicode text and returning Unicode strings via a list variable sent to a custom $getfonts() method.

Sample Project
(tnxm0002.zip)
Sample Library (GenericTest.lbs) Converting External Components to Unicode (pdf. See chapter 4)

References
Sample Chinese Fonts: www.certifiedchinesetranslation.com/fonts/Chinese.html
External Component SDK and Documentation: Component SDK

Disclaimer: Omnis Software Ltd. ("Omnis") provides the information in this web page solely for informational purposes. Omnis does not recommend or endorse any third-party product or service mentioned herein and is not responsible for the content of external websites. Omnis makes no warranty of any kind with regard to any third-party product or service mentioned herein, including, but not limited to, any implied warranty of merchantability and fitness for a particular purpose. Omnis shall not be liable for any direct, indirect, general, incidental, special or consequential damages in connection with any third-party product or service mentioned herein.