International Language Support (INTL)
cf) Firebird 2.5 Release Notes
Some improvements appear in this release to tighten and enhance Firebird's handling capabilities for international language environment.
Default COLLATION Attribute for a Database
Databases of ODS 11.2 and higher can now optionally be created with a default collation associated with the default character set. For details, please see Default COLLATION Attribute for a Database in the DDL chapter.
ALTER CHARACTER SET Command
DDL syntax has been introduced to enable the default collation for a character set to be set at database level. For details, please see ALTER CHARACTER SET Command in the DDL chapter.
Connection Strings & Character Sets
Capability has been implemented in the API database connection (DPB) area to interoperate with the character set and/or code page of server and client, to avoid the previous problems that could occur when file names contained non-ASCII characters.
Refer to the topic Connection Strings & Character Sets in the chapter Changes to the Firebird API and ODS. Even if you are not normally interested in the API, this topic will be a worthwhile read if you have been bothered with such issues.
Other Improvements
Introducer Syntax Usage
The usage of introducer syntax, i.e., prefixing an underscore to a character set name, to force the succeeding text literal to transliterated to that character set, has caused some problems in situations where a single SQL statement entails usage of more than one character set. The actual problems differ from version to version, showing up as transliteration errors, malformed string errors or just as some kind of unexpected behaviour.
Problems could occur in two different usage scenarios.-
One query is employing the introducer syntax when another query perfoms a select from MON$STATEMENTS
Introducer syntax was used in a PSQL module
To enable a workaround for such problems, it is now possible to transform the literal string into the hex representation of the ASCII characters being submitted by the introducer. For example:
select _dos850 '123áé456' from rdb$database
may be transformed to
select _dos850 X'313233A082343536' from rdb$database
Malformed UNICODE_FSS Characters Disallowed
Tracker reference CORE-1600.
Malformed characters are no longer allowed in data for UNICODE_FSS columns.
Repair Switches for Malformed Strings
New restore switches were added to the gbak utility code for the purpose of repairing malformed UNICODE_FSS data and metadata by restoring a backup of the affected database. Details are in the gbak section of the Utilities chapter.
Numeric Sort Attributes
Tracker reference: CORE-1945)
For UNICODE collations only, a custom attribute NUMERIC-SORT has been enabled for specifying the order by which to sort numerals.
Format & Usage
NUMERIC-SORT={0 | 1}
The default, 0, sorts numerals in alphabetical order. For example:
1 10 100 2 20
1 sorts numerals in numerical order. For example:
1 2 10 20 100
Example
create collation unicode_num for utf8 from unicode 'NUMERIC-SORT=1';
Character Sets and Collations
UNICODE_CI_AI
Tracker reference CORE-824.
UNICODE_CI_AI: case-insensitive, accent-insensitive collation added for UTF8.
WIN_1258
Tracker reference CORE-2185.
Added alias WIN_1258 for WIN1258 character set, for consistency with other WIN* character sets.
SJIS and EUCJ Character Sets
Tracker reference CORE-2103.
Strings in SJIS and EUCJ character sets are now verified for well-formedness.
Character set GB18030
Tracker reference CORE-2636.
GB18030 is a Chinese national standard describing the required language and character support necessary for software in China. It has been activated from ICU.