[Firebird Wiki]

Edit of Firebird 2.0 INTL (未整理)

  • ホーム
  • 差分
  • バックアップ
  • リロード
  • 一覧
  • 検索
  • 最終更新
  • ヘルプ
  • ログイン

  1. Firebird 2.0 INTL (未整理)
  2. Edit of Firebird 2.0 INTL (未整理)
  • テキスト整形のルールを表示する

    • 新規
    • ソース
    • 名前変更
    • ホーム
    • 一覧
    • 検索
    • 最終更新
    • バックアップ
    • リンク元
    • ログ
    • ヘルプ
    • 最終更新のRSS
    Founded by anonymous
    Powered by PukiWiki Advance v 2.0.2. Processing time: 0.846 sec.
    Original Theme Design by PukiWiki Plus! Team.
    PukiWiki Advance HTML 5

    Edit of Firebird 2.0 INTL (未整理)[TOC]

    undefined
    差分
    New Features for Text Data
    
    Table of Contents
    
    New String Functions
    New INTL Interface for Non-ASCII Character Sets
    New String Functions
    
    LOWER()
    TRIM()
    New String Size Functions
    Two new string functions were added:
    
    LOWER()
    
    A. dos Santos Fernandes
    
    LOWER() returns the input argument converted to all lower-case characters.
    
    Example
    
    isql -q -ch dos850
    
    SQL> create database 'test.fdb';
    SQL> create table t (c char(1) character set dos850);
    SQL> insert into t values ('A');
    SQL> insert into t values ('E');
    SQL> insert into t values ('Á');;
    SQL> insert into t values ('É');
    SQL>
    C      LOWER
    ====== ======
    A      a
    E      e
    Á      á
    É      é
     
    TRIM()
    
    A. dos Santos Fernandes
    
    TRIM trims characters (default: blanks) from the left and/or right of a string.
    
    Syntax Pattern
    
    TRIM <left paren> [ [ <trim specification> ] [ <trim character> ]
      FROM ] <value expression> <right paren>
    
       <trim specification> ::=  LEADING  | TRAILING  | BOTH
    
       <trim character> ::= <value expression>
     
    Rules
    
    If <trim specification> is not specified, BOTH is assumed.
    
    If <trim character> is not specified, ' ' is assumed.
    
    If <trim specification> and/or <trim character> is specified, FROM should be specified.
    
    If <trim specification> and <trim character> is not specified, FROM should not be specified.
    
    Examples
    
    A)
    
      select
        rdb$relation_name,
        trim(leading 'RDB$' from rdb$relation_name)
      from rdb$relations
        where rdb$relation_name starting with 'RDB$';
     
    B)
    
      select
        trim(rdb$relation_name) || ' is a system table'
      from rdb$relations
        where rdb$system_flag = 1;
     
    New String Size Functions
    
    A. dos Santos Fernandes
    
    Three new functions will return information about the size of strings:
    
    BIT_LENGTH returns the length of a string in bits
    
    CHAR_LENGTH/CHARACTER_LENGTH returns the length of a string in characters
    
    OCTET_LENGTH returns the length of a string in bytes
    
    Syntax Pattern
    
    These three functions share a similar syntax pattern, as follows.-
    
    <length function> ::=
    { BIT_LENGTH | CHAR_LENGTH | CHARACTER_LENGTH | OCTET_LENGTH } ( <value expression> <)
     
    Example
    
      select
        rdb$relation_name,
        char_length(rdb$relation_name),
        char_length(trim(rdb$relation_name))
      from rdb$relations;
     
    New INTL Interface for Non-ASCII Character Sets
    
    A. dos Santos Fernandes
    
    Architecture
    Enhancements
    New Character Sets and Collations Implemented
    Character Set Bug Fixes
    A feature of Firebird 2 is the introduction of a new interface for international character sets. Originally described by N. Samofatov, the new interface features a number of enhancements that have been implemented by me.
    
    Architecture
    
    Firebird allows character sets and collations to be declared in any character field or variable declaration. The default character set can also be specified at database create time, to cause every CHAR/VARCHAR declaration that doesn't specifically included a CHARACTER SET clause to use it.
    
    At attachment time you can specify the character set that the client is to use to read strings. If no "client" (or "connection") character set is specified, character set NONE is assumed.
    
    Two special character sets, NONE and OCTETS, can be used in declarations. However, OCTETS cannot be used as a connection character set. The two sets are similar, except that the space character of NONE is ASCII 0x20, whereas the space character OCTETS is 0x00. NONE and OCTETS are "special" in the sense that they do not follow the rule that other charsets do regarding conversions.
    
    With other character sets, conversion is performed as CHARSET1->UNICODE->CHARSET2.
    
    With NONE/OCTETS the bytes are just copied: NONE/OCTETS->CHARSET2 and CHARSET1->NONE/OCTETS.
    
    Enhancements
    
    Enhancements include:
    
    Well-formedness checks
    
    Some character sets (especially multi-byte) do not accept just any string. Now, the engine verifies that strings are well-formed when assigning from NONE/OCTETS and when strings sent by the client (the statement string and parameters).
    
    Uppercasing
    
    In FB 1.5.X only ASCII characters are uppercased in a character set's default (binary) collation order, which is used if no collation is specified.
    
    For example,
    
      isql -q -ch dos850
      SQL> create database 'test.fdb';
      SQL> create table t (c char(1) character set dos850);
      SQL> insert into t values ('a');
      SQL> insert into t values ('e');
      SQL> insert into t values ('á');
      SQL> insert into t values ('é');
      SQL>
      SQL> select c, upper(c) from t;
    
      C      UPPER
      ====== ======
           a      A
           e      E
           á      á
           é      é
     
    In FB 2.0 the result is:
    
      C      UPPER
      ====== ======
           a      A
           e      E
           á      Á
           é      É
     
    Maximum String Length
    
    In FB 1.5.X the engine does not verify the logical length of multi-byte character set (MBCS) strings. Hence, a UNICODE_FSS field takes three times as many characters as the declared field size, three being the maximum length of one UNICODE_FSS character).
    
    This has been retained for compatibility for legacy character sets. However, new character sets (UTF8, for example) do not inherit this limitation.
    
    sqlsubtype and Attachment Character Set
    
    When the character set of a CHAR or VARCHAR column is anything but NONE or OCTETS and the attachment character set is not NONE, the sqlsubtype member of an XSQLVAR pertaining to that column now contains the attachment (connection) character set number instead of the column's character set.
    
    Enhancements for BLOBs
    
    Several enhancements have been added for text BLOBs.
    
    COLLATE clauses for BLOBs
    
    A DML COLLATE clause is now allowed with BLOBs.
    
    Example
    
      select blob_column from table
        where blob_column collate unicode = 'foo';
     
    Full equality comparisons between BLOBs
    
    Comparison can be performed on the entire content of a text BLOB.
    
    Character set conversion for BLOBs
    
    Conversion between character sets is now possible when assigning to a BLOB from a string or another BLOB
    
    INTL Plug-ins
    
    Character sets and collations are installed using a manifest file.
    
    The manifest file should be put in the $rootdir/intl with a .conf extension. It is used to locate character sets and collations in the libraries. If a character set/collation is declared more than once, it is not loaded and the error is reported in the log.
    
    The symbol $(this) is used to indicate the same directory as the manifest file and the library extension should be omitted.
    
    Example of a Section from fbintl.conf
    
       <intl_module fbintl>
           filename    $(this)/fbintl
       </intl_module>
    
       <charset ISO8859_1>
           intl_module    fbintl
           collation    ISO8859_1
           collation    DA_DA
           collation    DE_DE
           collation    EN_UK
           collation    EN_US
           collation    ES_ES
           collation    PT_BR
           collation    PT_PT
       </charset>
    
       <charset WIN1250>
           intl_module    fbintl
           collation    WIN1250
           collation    PXW_CSY
           collation    PXW_HUN
           collation    PXW_HUNDC
       </charset>
     
    New Character Sets/Collations
    
    UTF8 character set
    
    The UNICODE_FSS character set has a number of problems: it's an old version of UTF8 that accepts malformed strings and does not enforce correct maximum string length. In FB 1.5.X UTF8 is an alias to UNICODE_FSS.
    
    Now, UTF8 is a new character set, without the inherent problems of UNICODE_FSS.
    
    UNICODE collations (for UTF8)
    
    UCS_BASIC works identically to UTF8 with no collation specified (sorts in UNICODE code-point order). The UNICODE collation sorts using UCA (Unicode Collation Algorithm).
    
    Sort order sample:
    
      isql -q -ch dos850
      SQL> create database 'test.fdb';
      SQL> create table t (c char(1) character set utf8);
      SQL> insert into t values ('a');
      SQL> insert into t values ('A');
      SQL> insert into t values ('á');
      SQL> insert into t values ('b');
      SQL> insert into t values ('B');
      SQL> select * from t order by c collate ucs_basic;
    
      C
      ======
      A
      B
      a
      b
      á
    
      SQL> select * from t order by c collate unicode;
    
      C
      ======
      a
      A
      á
      b
      B
     
    Brazilian collations
    
    Two case-insensitive/accent-insensitive collations were created for Brazil: WIN_PTBR (for WIN1252) and PT_BR (for ISO8859_1).
    
    Sort order and equality sample:
    
      isql -q -ch dos850
      SQL> create database 'test.fdb';
      SQL> create table t (c char(1) character set iso8859_1 collate pt_br);
      SQL> insert into t values ('a');
      SQL> insert into t values ('A');
      SQL> insert into t values ('á');
      SQL> insert into t values ('b');
      SQL> select * from t order by c;
    
      C
      ======
      A
      a
      á
      b
    
      SQL> select * from t where c = 'â';
    
      C
      ======
      a
      A
      â
     
    Drivers
    
    New character sets and collations are implemented through dynamic libraries and installed in the server with a manifest file in the intl subdirectory. For an example, see fbintl.conf.
    
    Not all implemented character sets and collations need to be listed in the manifest file. Only those listed are available and duplications are not loaded.
    
    Adding More Character Sets to a Database
    
    For installing additional character sets and collations into a database, the character sets and collations should be registered in the database's system tables (rdb$character_sets and rdb$collations). The file misc/intl.sql, in your Firebird 2 installation, is a script of stored procedures for registering and unregistering them.
    
    New Character Sets and Collations Implemented
    
    ES_ES_CI_AI for ISO8859_1 Character Set
    
    A. dos Santos Fernandes
    
    Spanish language case- and accent-insensitive collation for ISO8859_1 character set.
    
    KOI8-R
    
    O. Loa, A. Karyakin
    
    Russian language character set and dictionary collation.
    
    KOI8-U
    
    O. Loa, A. Karyakin
    
    Ukrainian language character set and dictionary collation.
    
    WIN1257_LV
    
    O. Loa, A. Karyakin
    
    Latvian dictionary collation.
    
    WIN1257_LT
    
    O. Loa, A. Karyakin
    
    Lithuanian dictionary collation.
    
    WIN1257_EE
    
    O. Loa, A. Karyakin
    
    Estonian dictionary collation.
    
    UTF8
    
    A. dos Santos Fernandes
    
    Unicode 4.0 support with UTF8 character set and collations UCS_BASIC and UNICODE.
    
    Brazilian collations
    
    A. dos Santos Fernandes, P. H. Albanez
    
    Collation PT_BR for ISO8859_character set
    
    Collation WIN_PTBR for WIN1252 character set
    
    Bosnian Collation
    
    F. Hasovic
    
    New Bosnian language collation BS_BA was added for WIN1250 character set.
    
    Czech Collations
    
    I. Prenosil, A. dos Santos Fernandes
    
    WIN_CZ: case-insensitive Czech language collation for WIN1250 character set
    
    WIN_CZ_CI_AI: case-insensitive, accent-insensitive Czech language collation for WIN1250 character set
    
    Vietnamese Character Set
    
    Nguyen The Phuong, A. dos Santos Fernandes
    
    Charset WIN1258 for Vietnamese language.
    
    Polish Collation
    
    Jaroslaw Glowacki, A. dos Santos Fernandes
    
    Added new collation ISO_PLK for ISO8859_2 charset (Polish language).
    
    Character Set Bug Fixes
    
    A. dos Santos Fernandes
    
    The following bugs related to character sets and collations were fixed:
    
    SF #1073212   An Order By on a big column with a COLLATE clause would terminate the server.
    
    SF #939844   A query in a UNICODE database would throw a GDS Exception if it was longer than 263 characters.
    
    SF #977785   Wrong character lengths were being returned from some multi-byte character sets (UTF-8, East-Asian charsets).
    
    SF #536243   A correct result is now returned when the UPPER() function is applied to a UNICODE_FSS string.
    
    SF #942726   UPPER did not convert aacute to Aacute for ISO8859_1
    
    SF #544630   Some problems were reported when connecting using UNICODE.
    
    SF #540547   Some problems involving concatenation, numeric fields and character set were fixed.
    
    Unregistered bug   A query could produce different results, depending on the presence of an index, when the last character of the string was the first character of a compression pair.
    
    Unregistered bug   SUBSTRING did not work correctly with a BLOB in a character set.
    
    Unregistered bug   Pattern matching with multi-byte BLOBs was being performed in binary mode.
    
    Unregistered bug   Connecting with a multi-byte character set was unsafe if the database had columns using a different character set.
    絵文字
    文字色
    ヒント
    色指定は、最初に選択した色が文字色、次に選択した色が背景色になります。

    選択範囲を処理後は、その範囲が選択したままになっています。
    続けて文字を入力する場合は、[ → ]キーでカーソルを移動してから入力してください。

    [ ☺ ] ボタンは、絵文字パレットを開きます。
    [ ⏎ ] ボタンは、改行を入れます。
    [ &# ] ボタンは、選択文字列を数値文字参照に変換します。
    [ ]は、このページの送信前の編集内容をフラッシュします。