OS/2 Warp 4 and up include APIs for Unicode support, referred to as the Universal Language Support (ULS) functions. Unfortunately, they are very poorly documented.
Consequently, I have embarked upon a series of projects to make the Unicode support in OS/2 more accessible to developers. These are available below.
- Updated API reference documentation
- An introductory programming guide
- A comprehensive list of all codepages supported by OS/2, which can be used with the ULS API conversion functions
- A REXX library which provides access to parts of the ULS API from REXX.
- Various sample programs illustrating how to use Unicode/ULS under OS/2.
Updated Programmer's Reference
The OS/2 Warp 4 Toolkit contained some cursory reference documentation (UNIAPI.INF) for ULS, but this is now severely outdated. Later versions of the Toolkit provide updated documentation in HTML format (UNIAPI.HTM), but it is woefully incomplete, very poorly formatted, and contains some blatant errors.
I have undertaken a major revamping of the ULS API reference documentation. My document is based on IBM's UNIAPI.HTM from the 4.52 Toolkit, but with many improvements:
- The HTML has been completely reformatted so as to be
significantly more clear and readable. (Compatibility with
older web browsers has nonetheless been preserved.)
- A section describing possible API return codes has been added.
- Several incorrect, incomplete, poorly-expressed or just plain
misleading function descriptions have been fixed. In addition,
important missing information has been added (such as the entire
function description for UniStrToUcs(), which in the IBM
version was actually a erroneous copy-paste of an entirely
different function instead).
- Various clarifications have been made, and in some cases helpful
comments have been added.
- Some of the sample code has been rewritten or replaced with code
that is more illustrative of the function in question (and that
- Descriptions of the ULS keyboard functions and data types (which
were inexplicably missing from the latest toolkit documentation)
have been restored.
This document remains a work in progress, as I continue to find areas for improvement. See the below for the change history.
The legal status of this document is a bit muddy. As it is a direct derivation of IBM's own documentation, I do not claim any particular rights over it. As far as I am concerned, it may be freely redistributed and/or modified. IBM's own legal terms regarding modifying or redistributing this documentation are unclear.
You can download the HTML documentation as a ZIP file, or read it online.
New Programmer's Guide
As a companion to the updated reference documentation, I have also written an introductory programming guide that describes how to properly use the Universal Language Support APIs.
This document is in OS/2 INF format. The IPF source is included, to make it easier for anyone who wants to suggest amendments (or for translation, if there's anyone that ambitious).
This document is entirely my own work, and may be redistributed freely.
- Browse the document online (HTML converted from the INF file)
- Download (84 kB ZIP file)
- Release history (plain text)
List of OS/2 Codepages
I have undertaken to create a comprehensive list of all codepages supported by OS/2. These codepages may be used in the ULS API conversion functions.
This table attempts to list every codepage, including aliases, known to modern OS/2 systems. Keep in mind that not all codepages may be available on all systems (depending on the installation options and/or operating system version).
The codepages listed in this table may be used in conjunction with the ULS codepage conversion functions. Many of them are not available for use as system or PM codepages, and will not be listed by the WinQueryCpList() function.
Explanation of Fields
- Lists the codepage number. ("OS2UGL" is a special codepage which has no number, and is thus identified by name.)
A brief description of the codepage, including the language or character set standard(s) covered, and any additional information about the encoding format used.
Codepages prefixed with 'IBM' indicate encodings based on modifications of the standard DOS 8-bit ASCII layout, and are not recommended for cross-platform interchange.
Codepages prefixed with 'ISO' are official ISO standards, and may be used for interchange with other systems.
- Indicates the underlying encoding of Latin text on which the
codepage is based.
- "ASCII" indicates a PC codepage compatible with the 7-bit displayable ASCII character set.
- "EBCDIC" indicates an IBM mainframe (System 370/390/iSeries) codepage based on EBCDIC.
- "Other" indicates a codepage which is not byte-for-byte compatible with either ASCII or EBCDIC. (By definition this includes all fixed-width double-byte codepages.)
- Indicates the number of bytes used by the codepage to represent a single character. Fixed-width codepages will show a single integer value; variable-width codepages will show a range.
- Indicates whether or not the codepage may be used as an OS/2 process codepage (through the CODEPAGE setting in CONFIG.SYS). This does not take into account the system's COUNTRY setting, which may impose additional restrictions on which codepages may actually be used in this way.
- Indicates whether or not the codepage may be used as a display codepage within Presentation Manager (either through WinSetCp() or GPI font attributes).
- Additional information about the codepage.
- Read online (72 kB HTML document)
- Download in OpenOffice 1.0 format (11 kB spreadsheet)
- Download in CSV format (11 kB tab-delimited ASCII file)
REXX Universal Language Support library — RXULS.DLL
REXX Universal Language Support (RxULS) provides a REXX interface to selected parts of the OS/2 Universal Language Support API (ULS).
Using RxULS, it becomes possible to do the following from REXX:
- Search or transform text strings according to locale-specific rules.
- Query locale information.
- Convert text strings from one codepage to another, including to or from Unicode encodings such as UTF-8 and UCS-2.
- Access Unicode-formatted clipboard text.
See the documentation for details.
|RXULS.DLL with documentation, examples, and source.
CPMAP is a simple program that can display a complete character map of any OS/2 codepage (even those which are not available for use as system or PM codepages).
I originally wrote this program to help me create the list of codepages (above). However, it serves as a useful illustration of some of the Unicode APIs. (It can also be handy for debugging fonts.)
In some ways, this program is similar to Ken Borgendale's ShowCP program, although it evolved quite independently. Unlike ShowCP, CPMAP can display any installed codepage, not just a selected few. (Conversely, though, it lacks both print support and a glyph-details mode).
For best results, CPMAP should be used in conjunction with a Unicode outline font. It defaults to using Times New Roman MT 30, which is included in recent versions of OS/2 (Warp Server for e-business and later), and is also available with some versions of Java 1.1.8.
CPMAP is made available under a BSD-style license; see the documentation for details.
- Viewing a single-byte codepage
- Viewing a multi-byte codepage:
|Program and source code included.
Unicode Clipboard Demonstration
This is a very simple program that demonstrates how to implement support for the "text/unicode" clipboard format used by the Mozilla family of applications.
The user interface consists only of an MLE (editor) control. If you paste text that was copied from Mozilla (or some other program that supports the same format) into the MLE, it will be converted from UCS-2 (Unicode) into the current codepage. Conversely, text which you copy from the MLE will be converted from the current codepage into UCS-2 format.
No other functionality (loading, saving, printing, etc.) is supported. This program is intended to demonstrate clipboard support, and nothing more.
The source code is included, along with a Makefile for the IBM C Compiler (version 3.x). It may be considered public domain code, and may be freely used for any purpose, commercial or otherwise.