Attributes names and identifiers

Every character attribute defined by ULS has a symbolic identifier (referencing an integer value) associated with it. Most attributes also have an attribute name, which is a human-readable UniChar string.

The UniQueryAttr function may be used to obtain the integer value of the identifier associated with a specified attribute name.

Character attributes are grouped into several different categories.:

Character type

These attributes indicate the type of character (lowercase, uppercase, punctuation, whitespace, and so on). Their names all start with a lowercase alphabetic character.

There are three groups of character type attributes.

POSIX (XPG/4) types. Their symbolic identifiers all start with CT_.
Extended (Unicode) types. Their symbolic identifiers all start with C3_.
Win32 compatibility types. These attributes are equivalent to POSIX types, and do not have names; their symbolic identifiers all start with C1_.

Character set

These attributes indicate character sets (linguistic or symbolic) to which a character may belong. These attributes all have names starting with an underscore character ("_"); their symbolic identifiers all begin with CHS_.

BiDi

These attributes are used by Unicode for determining character orientation (writing direction). Their names all start with a number-sign character ("#"), and their symbolic identifiers start with C2_.

Codepage

These attributes are used to indicate a subset of common codepages within the OS/2 Universal Glyph List for which the characters are valid. They do not have names; their symbolic identifiers all begin with CCP_.

All of the defined attributes are listed below. Note that all attribute names must be specified in lowercase.

Character type attributes

The following are the standard POSIX attributes. Multiple attributes may be combined together, in calls to UniQueryChar, using bitwise OR syntax.

┌──────────────────────────────────────────────────────────────────────────────┐
│NAME           IDENTIFIER             DESCRIPTION                             │
├──────────────────────────────────────────────────────────────────────────────┤
│alnum          CT_ALNUM               Alphabetic and numeric characters       │
├──────────────────────────────────────────────────────────────────────────────┤
│alpha          CT_ALPHA               Letters and linguistic marks            │
├──────────────────────────────────────────────────────────────────────────────┤
│ascii          CT_ASCII               Standard ASCII character                │
├──────────────────────────────────────────────────────────────────────────────┤
│blank          CT_BLANK               Space and tab characters                │
├──────────────────────────────────────────────────────────────────────────────┤
│cntrl          CT_CNTRL               Control and format characters           │
├──────────────────────────────────────────────────────────────────────────────┤
│digit          CT_DIGIT               Digits 0 through 9                      │
├──────────────────────────────────────────────────────────────────────────────┤
│graph          CT_GRAPH               All except controls and space           │
├──────────────────────────────────────────────────────────────────────────────┤
│lower          CT_LOWER               Lower case alphabetic character         │
├──────────────────────────────────────────────────────────────────────────────┤
│number         CT_NUMBER              Integral numbers between 0 and 9        │
├──────────────────────────────────────────────────────────────────────────────┤
│print          CT_PRINT               Everything except control characters    │
├──────────────────────────────────────────────────────────────────────────────┤
│punct          CT_PUNCT               Punctuation marks                       │
├──────────────────────────────────────────────────────────────────────────────┤
│space          CT_SPACE               Whitespace and line-breaking characters │
├──────────────────────────────────────────────────────────────────────────────┤
│symbol         CT_SYMBOL              Symbol                                  │
├──────────────────────────────────────────────────────────────────────────────┤
│upper          CT_UPPER               Upper case alphabetic character         │
├──────────────────────────────────────────────────────────────────────────────┤
│xdigit         CT_XDIGIT              Hexadecimal digits (0-9, a-f, A-F)      │
└──────────────────────────────────────────────────────────────────────────────┘

The following are the extended (Unicode) attributes. Those without names may only be used as bitmasks for the UNICTYPE data structure and the UniQueryStringType function.

┌──────────────────────────────────────────────────────────────────────────────┐
│NAME           IDENTIFIER             DESCRIPTION                             │
├──────────────────────────────────────────────────────────────────────────────┤
│               C3_ALPHA               Alphabetic character                    │
├──────────────────────────────────────────────────────────────────────────────┤
│diacritic      C3_DIACRITIC           Diacritic mark                          │
├──────────────────────────────────────────────────────────────────────────────┤
│fullwidth      C3_FULLWIDTH           Full-width variant                      │
├──────────────────────────────────────────────────────────────────────────────┤
│halfwidth      C3_HALFWIDTH           Half-width variant                      │
├──────────────────────────────────────────────────────────────────────────────┤
│hiragana       C3_HIRAGANA            Hiragana character                      │
├──────────────────────────────────────────────────────────────────────────────┤
│ideograph      C3_IDEOGRAPH           Kanji/Han character                     │
├──────────────────────────────────────────────────────────────────────────────┤
│kashida        C3_KASHIDA             Arabic tatweel (elongation character)   │
├──────────────────────────────────────────────────────────────────────────────┤
│katakana       C3_KATAKANA            Katakana character                      │
├──────────────────────────────────────────────────────────────────────────────┤
│nonspacing     C3_NONSPACING          Non-spacing mark                        │
├──────────────────────────────────────────────────────────────────────────────┤
│nsdiacritic    C3_NSDIACRITIC         Non-spacing diacritic                   │
├──────────────────────────────────────────────────────────────────────────────┤
│nsvowel        C3_NSVOWEL             Non-spacing vowel                       │
├──────────────────────────────────────────────────────────────────────────────┤
│               C3_SYMBOL              Symbol                                  │
├──────────────────────────────────────────────────────────────────────────────┤
│vowelmark      C3_VOWELMARK           Vowel mark                              │
└──────────────────────────────────────────────────────────────────────────────┘

The following Win32 compatibility attributes are defined.

┌─────────────────────────────────────┐
│IDENTIFIER     POSIX EQUIVALENT      │
├─────────────────────────────────────┤
│C1_UPPER       CT_UPPER              │
├─────────────────────────────────────┤
│C1_LOWER       CT_LOWER              │
├─────────────────────────────────────┤
│C1_DIGIT       CT_DIGIT              │
├─────────────────────────────────────┤
│C1_SPACE       CT_SPACE              │
├─────────────────────────────────────┤
│C1_PUNCT       CT_PUNCT              │
├─────────────────────────────────────┤
│C1_CNTRL       CT_CNTRL              │
├─────────────────────────────────────┤
│C1_BLANK       CT_BLANK              │
├─────────────────────────────────────┤
│C1_XDIGIT      CT_XDIGIT             │
├─────────────────────────────────────┤
│C1_ALPHA       CT_ALPHA              │
└─────────────────────────────────────┘

Character set attributes

These attributes represent classes of character, and must be tested individually.

┌──────────────────────────────────────────────────────────────────────────────┐
│NAME           IDENTIFIER             DESCRIPTION                             │
├──────────────────────────────────────────────────────────────────────────────┤
│_apl           CHS_APL                APL character                           │
├──────────────────────────────────────────────────────────────────────────────┤
│_arabic        CHS_ARABIC             Arabic character                        │
├──────────────────────────────────────────────────────────────────────────────┤
│_arrow         CHS_ARROW              Arrow character                         │
├──────────────────────────────────────────────────────────────────────────────┤
│_bengali       CHS_BENGALI            Bengali character                       │
├──────────────────────────────────────────────────────────────────────────────┤
│_bopomofo      CHS_BOPOMOFO           Bopomofo character                      │
├──────────────────────────────────────────────────────────────────────────────┤
│_box           CHS_BOX                Box or line drawing character           │
├──────────────────────────────────────────────────────────────────────────────┤
│_currency      CHS_CURRENCY           Currency Symbol                         │
├──────────────────────────────────────────────────────────────────────────────┤
│_cyrillic      CHS_CYRILLIC           Cyrillic character                      │
├──────────────────────────────────────────────────────────────────────────────┤
│_dash          CHS_DASH               Dash character                          │
├──────────────────────────────────────────────────────────────────────────────┤
│_devanagari    CHS_DEVANAGARI         Devanagari character                    │
├──────────────────────────────────────────────────────────────────────────────┤
│_dingbat       CHS_DINGBAT            Dingbat                                 │
├──────────────────────────────────────────────────────────────────────────────┤
│_fraction      CHS_FRACTION           Fraction value                          │
├──────────────────────────────────────────────────────────────────────────────┤
│_greek         CHS_GREEK              Greek character                         │
├──────────────────────────────────────────────────────────────────────────────┤
│_gujarati      CHS_GUJARATI           Gujarati character                      │
├──────────────────────────────────────────────────────────────────────────────┤
│_gurmukhi      CHS_GURMUKHI           Gurmukhi character                      │
├──────────────────────────────────────────────────────────────────────────────┤
│_hanguel       CHS_HANGUEL            Hangul Jamo character                   │
├──────────────────────────────────────────────────────────────────────────────┤
│_hebrew        CHS_HEBREW             Hebrew character                        │
├──────────────────────────────────────────────────────────────────────────────┤
│_hiragana      CHS_HIRAGANA           Hiragana character set                  │
├──────────────────────────────────────────────────────────────────────────────┤
│_katakana      CHS_KATAKANA           Katakana character set                  │
├──────────────────────────────────────────────────────────────────────────────┤
│_lao           CHS_LAO                Laotian character                       │
├──────────────────────────────────────────────────────────────────────────────┤
│_latin         CHS_LATIN              Latin character                         │
├──────────────────────────────────────────────────────────────────────────────┤
│_linesep       CHS_LINESEP            Line separator                          │
├──────────────────────────────────────────────────────────────────────────────┤
│_math          CHS_MATH               Math symbol                             │
├──────────────────────────────────────────────────────────────────────────────┤
│_punctstart    CHS_PUNCTSTART         Punctuation start                       │
├──────────────────────────────────────────────────────────────────────────────┤
│_punctend      CHS_PUNCTEND           Punctuation end                         │
├──────────────────────────────────────────────────────────────────────────────┤
│_tamil         CHS_TAMIL              Tamil character                         │
├──────────────────────────────────────────────────────────────────────────────┤
│_telegu        CHS_TELEGU             Telegu character                        │
├──────────────────────────────────────────────────────────────────────────────┤
│_thai          CHS_THAI               Thai character                          │
├──────────────────────────────────────────────────────────────────────────────┤
│_userdef       CHS_USERDEF            User defined character                  │
└──────────────────────────────────────────────────────────────────────────────┘

BiDi attributes

These attributes represent classes of character, and must be tested individually.

┌──────────────────────────────────────────────────────────────────────────────┐
│NAME           IDENTIFIER             DESCRIPTION                             │
├──────────────────────────────────────────────────────────────────────────────┤
│#arabicnum     C2_ARABICNUMBER        Arabic numbers                          │
├──────────────────────────────────────────────────────────────────────────────┤
│#blocksep      C2_BLOCKSEPARATOR      Block separator                         │
├──────────────────────────────────────────────────────────────────────────────┤
│#commonsep     C2_COMMONSEPARATOR     Common separator                        │
├──────────────────────────────────────────────────────────────────────────────┤
│#euronum       C2_EUROPENUMBER        European number                         │
├──────────────────────────────────────────────────────────────────────────────┤
│#eurosep       C2_EUROPESEPARATOR     European separator                      │
├──────────────────────────────────────────────────────────────────────────────┤
│#euroterm      C2_EUROPETERMINATOR    European terminator                     │
├──────────────────────────────────────────────────────────────────────────────┤
│#left          C2_LEFTTORIGHT         Left to right text orientation          │
├──────────────────────────────────────────────────────────────────────────────┤
│#mirrored      C2_MIRRORED            Symmetrical text orientation            │
├──────────────────────────────────────────────────────────────────────────────┤
│#neutral       C2_OTHERNEUTRAL        Other neutral                           │
├──────────────────────────────────────────────────────────────────────────────┤
│#right         C2_RIGHTTOLEFT         Right to left text orientation          │
├──────────────────────────────────────────────────────────────────────────────┤
│#whitespace    C2_WHITESPACE          Whitespace                              │
└──────────────────────────────────────────────────────────────────────────────┘

Codepage attributes

These attributes are only used as bitmasks for the UNICTYPE data structure and the UniQueryStringType function.

┌─────────────────────────────────────┐
│IDENTIFIER     DESCRIPTION           │
├─────────────────────────────────────┤
│CCP_437        US PC                 │
├─────────────────────────────────────┤
│CCP_850        Multilingual PC       │
├─────────────────────────────────────┤
│CCP_SYMB       PostScript Symbol     │
├─────────────────────────────────────┤
│CCP_1252       Windows Latin 1       │
├─────────────────────────────────────┤
│CCP_1250       Windows Latin 2       │
├─────────────────────────────────────┤
│CCP_1251       Windows Cyrillic      │
├─────────────────────────────────────┤
│CCP_1254       Windows Turkish       │
├─────────────────────────────────────┤
│CCP_1257       Windows Baltic        │
└─────────────────────────────────────┘

[Back] [Next]