Data Manipulation Language (DML)

cf) Firebird 2.5 Release Notes

RegEx Search Support using SIMILAR TO

A new SIMILAR TO predicate is introduced to support regular expressions. The predicate's function is to verify whether a given SQL-standard regular expression matches a string argument. It is valid in any context that accepts Boolean expressions, such as WHERE clauses, CHECK constraints and PSQL IF() tests.

Syntax Patterns

similar predicate> ::=
	<value> [ NOT ] SIMILAR TO <similar pattern> [ ESCAPE <escape character> ]
similar pattern> ::= <character value expression: regular expression>
regular expression> ::=
     <regular term>
   | <regular expression> <vertical bar> <regular term>
regular term> ::=
     <regular factor>
   | <regular term> <regular factor>
regular factor> ::=
     <regular primary>
   | <regular primary> <asterisk>
   | <regular primary> <plus sign>
   | <regular primary> <question mark>
   | <regular primary> <repeat factor>
repeat factor> ::=
   <left brace> <low value> [ <upper limit> ] <right brace>
upper limit> ::= <comma> [ <high value> ]
low value> ::= <unsigned integer>
high value> ::= <unsigned integer>
regular primary> ::=
     <character specifier>
   | <percent>
   | <regular character set>
   | <left paren> <regular expression> <right paren>
character specifier> ::=
     <non-escaped character>
   | <escaped character>
regular character set> ::=
     <underscore>
   | <left bracket> <character enumeration>... <right bracket>
   | <left bracket> <circumflex> <character enumeration>... <right bracket>
   | <left bracket> <character enumeration include>... <circumflex> <character enumeration exclude>...
       <right bracket>
character enumeration include> ::= <character enumeration>
character enumeration exclude> ::= <character enumeration>
character enumeration> ::=
     <character specifier>
   | <character specifier> <minus sign> <character specifier>
   | <left bracket> <colon> <character class identifier> <colon> <right bracket>
character specifier> ::=
     <non-escaped character>
   | <escaped character>
character class identifier> ::=
     ALPHA
   | UPPER
   | LOWER
   | DIGIT
   | SPACE
   | WHITESPACE
   | ALNUM
Note
non-escaped character> is any character except <left bracket>, <right bracket>, <left paren>, <right paren>, <vertical bar>, <circumflex>, <minus sign>, <plus sign>, <asterisk>, <underscore>, <percent>, <question mark>, <left brace> and <escape character>.
escaped character> is the <escape character> succeeded by one of <left bracket>, <right bracket>, <left paren>, <right paren>, <vertical bar>, <circumflex>, <minus sign>, <plus sign>, <asterisk>, <underscore>, <percent>, <question mark>, <left brace> or <escape character>.

Table 10.1. Character class identifiers

Identifier Description Note ALPHA All characters that are simple latin letters (a-z, A-Z) Includes latin letters with accents when using accent-insensitive collation UPPER All characters that are simple latin uppercase letters (A-Z) Includes lowercase latters when using case-insensitive collation LOWER All characters that are simple latin lowercase letters (a-z) Includes uppercase latters when using case-insensitive collation DIGIT All characters that are numeric digits (0-9) SPACE All characters that are the space character (ASCII 32) WHITESPACE All characters that are whitespaces (vertical tab (9), newline (10), horizontal tab (11), carriage return (13), formfeed (12), space (32)) ALNUM All characters that are simple latin letters (ALPHA) or numeric digits (DIGIT)

Usage Guide

Return true for a string that matches <regular expression> or <regular term>:

     <regular expression> <vertical bar> <regular term>

   'ab' SIMILAR TO 'ab|cd|efg'   -- true
   'efg' SIMILAR TO 'ab|cd|efg'  -- true
   'a' SIMILAR TO 'ab|cd|efg'    -- false

Match zero or more occurrences of <regular primary>: <regular primary> <asterisk>

   '' SIMILAR TO 'a*'     -- true
   'a' SIMILAR TO 'a*'    -- true
   'aaa' SIMILAR TO 'a*'  -- true

Match one or more occurrences of <regular primary>: <regular primary> <plus sign>

   '' SIMILAR TO 'a+'     -- false
   'a' SIMILAR TO 'a+'    -- true
   'aaa' SIMILAR TO 'a+'  -- true

Match zero or one occurrence of <regular primary>: <regular primary> <question mark>

   '' SIMILAR TO 'a?'     -- true
   'a' SIMILAR TO 'a?'    -- true
   'aaa' SIMILAR TO 'a?'  -- false

Match exact <low value> occurrences of <regular primary>: <regular primary> <left brace> <low value> <right brace>

   '' SIMILAR TO 'a{2}'     -- false
   'a' SIMILAR TO 'a{2}'    -- false
   'aa' SIMILAR TO 'a{2}'   -- true
   'aaa' SIMILAR TO 'a{2}'  -- false

Match <low value> or more occurrences of <regular primary>: <regular primary> <left brace> <low value> <comma> <right brace>:

   '' SIMILAR TO 'a{2,}'     -- false
   'a' SIMILAR TO 'a{2,}'    -- false
   'aa' SIMILAR TO 'a{2,}'   -- true
   'aaa' SIMILAR TO 'a{2,}'  -- true

Match <low value> to <high value> occurrences of <regular primary> <regular primary> <left brace> <low value> <comma> <high value> <right brace>:

   '' SIMILAR TO 'a{2,4}'       -- false
   'a' SIMILAR TO 'a{2,4}'      -- false
   'aa' SIMILAR TO 'a{2,4}'     -- true
   'aaa' SIMILAR TO 'a{2,4}'    -- true
   'aaaa' SIMILAR TO 'a{2,4}'   -- true
   'aaaaa' SIMILAR TO 'a{2,4}'  -- false

Match any (non-empty) character: <underscore>

   '' SIMILAR TO '_'    -- false
   'a' SIMILAR TO '_'   -- true
   '1' SIMILAR TO '_'   -- true
   'a1' SIMILAR TO '_'  -- false

Match a string of any length (including empty strings): <percent>

   '' SIMILAR TO '%'         -- true
   'az' SIMILAR TO 'a%z'     -- true
   'a123z' SIMILAR TO 'a%z'  -- true
   'azx' SIMILAR TO 'a%z'    -- false

Group a complete <regular expression> to use as one single <regular primary> as a sub-expression: <left paren> <regular expression> <right paren>

   'ab' SIMILAR TO '(ab){2}'    -- false
   'aabb' SIMILAR TO '(ab){2}'  -- false
   'abab' SIMILAR TO '(ab){2}'  -- true

Match a character identical to one of <character enumeration>: <left bracket> <character enumeration>... <right bracket>

   'b' SIMILAR TO '[abc]'    -- true
   'd' SIMILAR TO '[abc]'    -- false
   '9' SIMILAR TO '[0-9]'    -- true
   '9' SIMILAR TO '[0-8]'    -- false

Match a character not identical to one of <character enumeration>: <left bracket> <circumflex> <character enumeration>... <right bracket>

   'b' SIMILAR TO '[^abc]'    -- false
   'd' SIMILAR TO '[^abc]'    -- true

Match a character identical to one of <character enumeration include> but not identical to one of <character enumeration exclude>: <left bracket> <character enumeration include>... <circumflex> <character enumeration exclude>...

   '3' SIMILAR TO '[[:DIGIT:]^3]'    -- false
   '4' SIMILAR TO '[[:DIGIT:]^3]'    -- true

Match a character identical to one character included in <character class identifier>. Refer to the table of Character Class Identifiers, above. May be used with <circumflex> to invert the logic (see above): <left bracket> <colon> <character class identifier> <colon> <right bracket>

   '4' SIMILAR TO '[[:DIGIT:]]'  -- true
   'a' SIMILAR TO '[[:DIGIT:]]'  -- false
   '4' SIMILAR TO '[^[:DIGIT:]]' -- false
   'a' SIMILAR TO '[^[:DIGIT:]]' -- true

Examples

create table department (

 number numeric(3) not null,
 name varchar(25) not null,
 phone varchar(14)
 check (phone similar to '\([0-9]{3}\) [0-9]{3}\-[0-9]{4}' escape '\')

);

insert into department

 values ('000', 'Corporate Headquarters', '(408) 555-1234');

insert into department

 values ('100', 'Sales and Marketing', '(415) 555-1234');

insert into department

 values ('140', 'Field Office: Canada', '(416) 677-1000');

insert into department

 values ('600', 'Engineering', '(408) 555-123'); -- check constraint violation

select * from department

 where phone not similar to '\([0-9]{3}\) 555\-%' escape '\';

Hex Literal Support

Support for hexadecimal numeric and binary string literals has been introduced.

Syntax Patterns

numeric hex literal> ::=
   { 0x | 0X } <hexit> [ <hexit>... ]
binary string literal> ::=
   { x | X } <quote> [ { <hexit> <hexit> }... ] <quote>
digit> ::=
   0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
hexit> ::=
   <digit> | A | B | C | D | E | F | a | b | c | d | e | f
Numeric Hex Literals

The number of <hexit> in the string cannot exceed 16.

If the number of <hexit> is greater than eight, the constant data type is a signed BIGINT. If it is eight or less, the data type is a signed INTEGER.

Tip

That means 0xF0000000 is -268435456 and 0x0F0000000 is 4026531840.

Binary String Literals

The resulting string is defined as a CHAR(n/2) CHARACTER SET OCTETS, where n is the number of <hexit>.

Examples

select 0x10, cast('0x0F0000000' as bigint)

 from rdb$database;

select x'deadbeef'

 from rdb$database;

New UUID Conversion Functions

Adriano dos Santos Fernandes

Tracker references CORE-1656 and CORE-1682.

Two new built-in functions, UUID_TO_CHAR and CHAR_TO_UUID, enable conversion between a UUID in the form of a CHAR(16) OCTETS string and the RFC4122-compliant form.

CHAR_TO_UUID()

The function CHAR_TO_UUID() converts the CHAR(32) ASCII representation of a UUID (XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX) to the CHAR(16) OCTETS representation, optimized for storage.

Syntax Model

CHAR_TO_UUID( <string> )

Example

select char_to_uuid('93519227-8D50-4E47-81AA-8F6678C096A1')

 from rdb$database;

UUID_TO_CHAR()

The function UUID_TO_CHAR() converts a CHAR(16) OCTETS UUID (as returned by the GEN_UUID() function) to the CHAR(32) ASCII representation (XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX).

Syntax Model

UUID_TO_CHAR( <string> )

Example

select uuid_to_char(gen_uuid())

 from rdb$database;

SOME_COL = ? OR ? IS NULL Predication

By popular request, particularly from Delphi programmers, support has been implemented for a predication that is able to “OR” test both the equivalence between a column and a parameter and whether the value passed to the parameter is NULL. This construct is often desired as a way to avoid the need to prepare one query to request a filtered result set and another for the same query without the filter.

Users of Delphi and other programming interfaces that apply client-side object names to parameters wanted the ability for the DSQL engine to recognise a usage like the following:

 WHERE col1 = :param1 OR :param1 IS NULL

At the API level, the language interface translates the query to

 WHERE col1 = ? OR ? IS NULL

That presented two problems:

While the programmer treated the parameter :param1 as though it were a single variable with two references, the API could not: it is presented with two parameters

The second parameter is of an unknown data type and the program has no way to assign to it

What was needed to solve this problem was to introduce a new data type to handle the “? is NULL” condition and teach Firebird to do the right thing when it received such a request.

The implementation works like this. To resolve the first problem, the request must supply two parameters (for our Delphi example):

 WHERE col1 = :param1 OR :param2 IS NULL

If “param1” is not NULL, the language interface is required to assign the correct value for the first parameter, set the XSQLVAR.sqlind to NOT NULL and leave XSQLVAR.sqldata NULL.

If “param2” is NULL, the language interface is required to set the XSQLVAR.sqlind of both parameters to NULL and leave the XSQLVAR.sqldata NULL.

In other words, for the parameter ( ? ) in ? IS NULL

: XSQLVAR.sqlind should be set in accordance with NULL/NON-NULL state of the parameter. This is the type of parameter that is described by the new constant SQL_NULL.

The XSQLVAR.sqldata of a SQL_NULL type of parameter should always be passed by the client application as a NULL pointer and should never be accessed.

NULL Specified in the Output Set

When NULL is specified as an output constant (select NULL from ...), it continues to be described as CHAR(1), rather than by SQL_NULL. That may change in a future version.

Extension to LIST() Function

A string expression is now allowed as the delimiter argument of the LIST() function.

Example

SELECT

 DISCUSSION_ID,
 LIST(COMMMENT, ASCII_CHAR(13))

FROM COMMENTS

 GROUP BY DISCUSSION_ID;

Extension to DATEADD and DATEDIFF() Functions

The WEEK unit was introduced for functions DATEADD and DATEDIFF.

MILLISECOND, SECOND, MINUTE and HOUR units are no longer invalid units to use with DATE arguments.

BIN_NOT() Function Added

Completing the set of built-in binary functions added in v.2.1, new function BIN_NOT() returns the result of a bitwise NOT operation performed on its argument.

Syntax Pattern

 BIN_NOT( <number> )

Example

select bin_not(flags) from x;

Write to Temporary Tables in a Read-Only Database

(V.2.5.1) Write operations to global temporary tables in a read-only database are enabled.

Optimizer Improvements

Changes in optimizer logic that address recognised problems include:

CROSS JOIN Logic (D. Yemanov) When a CROSS JOIN involved an empty table, the optimizer had no special logic to detect that the query was futile and return the empty set immediately. That shortcut logic has now been implemented.

Tracker reference CORE-2200.

Note

The same change was implemented in V.2.1.2.

Derived Tables (A. dos Santos Fernandes) The limit on the number of contexts available when using derived tables has been raised.

Tracker reference CORE-2029.

Timing of DEFAULT Evaluation (A. dos Santos Fernandes) Under rare conditions, the early evaluation of a DEFAULT value or expression defined for a column might give rise to a confused situation regarding the evaluation of an input value supplied for that column. The possibility was addressed by deferring the evaluation of DEFAULT and not actually performing the evaluation at all unless it was actually necessary.

Tracker reference CORE-1842.

Index Use for NOT IN Searches (A. dos Santos Fernandes) Better performance has been achieved for the NOT IN predicate by enabling the use of an index.

Tracker reference CORE-1137.

Undo Log Memory Consumption (D. Yemanov) Excessive memory consumption by the Undo log after a lengthy series of updates in a single transaction has been avoided.

Tracker reference CORE-1477.

Other Improvements

Other changes to smooth out the little annoyances include:

FREE_IT Error Detection (A. dos Santos Fernandes) Previously, a UDF declared with FREE_IT would crash if the pointer returned had not been allocated by the ib_util_malloc() function. Now, such a condition is detected, an exception is thrown and the pointer remains in its allocated state.

Tracker reference CORE-1937.

“Expression evaluation not supported” message improved (C. Valderrama) A number of secondary GDS codes were introduced to provide more details about an operation that fails with an “Expression evaluation not supported” exception, for example:

 'Argument for @1 in dialect 1 must be string or numeric'
 'Strings cannot be added to or subtracted from DATE or TIME types'
 'Invalid data type for subtraction involving DATE, TIME or TIMESTAMP types'
  etc.

These detailed messages follow the GDS code for the isc_expression_eval_err (expression evaluation not supported) error in the status vector.

Tracker reference CORE-1799.