International Language Support (INTL)

cf) Firebird 2.5 Release Notes

Some improvements appear in this release to tighten and enhance Firebird's handling capabilities for international language environment.

Default COLLATION Attribute for a Database

Databases of ODS 11.2 and higher can now optionally be created with a default collation associated with the default character set. For details, please see Default COLLATION Attribute for a Database in the DDL chapter.

ALTER CHARACTER SET Command

DDL syntax has been introduced to enable the default collation for a character set to be set at database level. For details, please see ALTER CHARACTER SET Command in the DDL chapter.

Connection Strings & Character Sets

Capability has been implemented in the API database connection (DPB) area to interoperate with the character set and/or code page of server and client, to avoid the previous problems that could occur when file names contained non-ASCII characters.

Refer to the topic Connection Strings & Character Sets in the chapter Changes to the Firebird API and ODS. Even if you are not normally interested in the API, this topic will be a worthwhile read if you have been bothered with such issues.

Other Improvements

Introducer Syntax Usage

The usage of introducer syntax, i.e., prefixing an underscore to a character set name, to force the succeeding text literal to transliterated to that character set, has caused some problems in situations where a single SQL statement entails usage of more than one character set. The actual problems differ from version to version, showing up as transliteration errors, malformed string errors or just as some kind of unexpected behaviour.

Problems could occur in two different usage scenarios.-

One query is employing the introducer syntax when another query perfoms a select from MON$STATEMENTS

Introducer syntax was used in a PSQL module

To enable a workaround for such problems, it is now possible to transform the literal string into the hex representation of the ASCII characters being submitted by the introducer. For example:

   select _dos850 '123áé456' from rdb$database

may be transformed to

   select _dos850 X'313233A082343536' from rdb$database

Malformed UNICODE_FSS Characters Disallowed

Tracker reference CORE-1600.

Malformed characters are no longer allowed in data for UNICODE_FSS columns.

Repair Switches for Malformed Strings

New restore switches were added to the gbak utility code for the purpose of repairing malformed UNICODE_FSS data and metadata by restoring a backup of the affected database. Details are in the gbak section of the Utilities chapter.

Numeric Sort Attributes

Tracker reference: CORE-1945)

For UNICODE collations only, a custom attribute NUMERIC-SORT has been enabled for specifying the order by which to sort numerals.

Format & Usage

 NUMERIC-SORT={0 | 1}

The default, 0, sorts numerals in alphabetical order. For example:

   1
   10
   100
   2
   20

1 sorts numerals in numerical order. For example:

   1
   2
   10
   20
   100

Example

create collation unicode_num for utf8 from unicode 'NUMERIC-SORT=1';

Character Sets and Collations

UNICODE_CI_AI

Tracker reference CORE-824.

UNICODE_CI_AI: case-insensitive, accent-insensitive collation added for UTF8.

WIN_1258

Tracker reference CORE-2185.

Added alias WIN_1258 for WIN1258 character set, for consistency with other WIN* character sets.

SJIS and EUCJ Character Sets

Tracker reference CORE-2103.

Strings in SJIS and EUCJ character sets are now verified for well-formedness.

Character set GB18030

Tracker reference CORE-2636.

GB18030 is a Chinese national standard describing the required language and character support necessary for software in China. It has been activated from ICU.