Parses a character string based on Java Regular Expression patterns as defined in java.util.regex.pattern.
Columns are based on match groups defined in the regex-pattern. Each group defines a column, and the groups are processed from left to right. Failure to match produces a NULL value result: If the regular expression does not match the the string passed as the first parameter, NULL is returned.
REGEX_LOG_PARSE (sourceString, regexPattern)
Where regexPattern is a constant string which is a regular expression.
The returned columns will be COLUMN1 through COLUMNn, where n is the number of groups in the regular expression. The columns will be of type char(m) where m is the length of sourceString.
The following code returns three extracts from the string 'abcde111fghij22klm'; one for each capture group:
SELECT trim(t.r.COLUMN1) col1, trim(t.r.COLUMN2) col2, trim(t.r.COLUMN3) col3 from
(values (REGEX_LOG_PARSE('abcde111fghij22klm', '([^0-9]*)1*([^0-9]*)2*([^0-9]*)'))) t(r);
Returning the following result:
'COL1','COL2','COL3'
'abcde','fghij','klm'
The following code returns three columns:
SELECT cast(trim(t.r.COLUMN1) as int) "Amount",
trim(t.r.COLUMN2) "Item",
cast(trim(t.r.COLUMN3) as date) "Ship Date"
from
(
values
(
REGEX_LOG_PARSE
(
'445 light bulbs should be in the basket. The order will ship on 2014-05-04.',
'^(\d+) (\S+\s\S+) should be in the basket. The order will ship on (\S+).'
)
)
) t(r);
which returns the following result:
'Amount','Item','Ship Date'
'445','light bulbs','2014-05-04'
For full details on Regex, see java.util.regex.pattern
Pattern | Description |
---|---|
[xyz] | Find single character of: x, y or z |
[^abc] | Find any single character except: x, y, or z |
[r-z] | Find any single character between r-z |
[r-zR-Z] | Find any single character between r-z or R-Z |
^ | Start of line |
$ | End of line |
\A | Start of string |
\z | End of string |
. | Any single character |
\s | Find any whitespace character |
\S | Find any non-whitespace character |
\d | Find any digit |
\D | Find any non-digit |
\w | Find any word character (letter, number, underscore) |
\W | Find any non-word character |
\b | Find any word boundary |
(...) | Capture everything enclosed |
(x|y) | Find x or y (also works with symbols such as \d or \s) |
x? | Find zero or one of x (also works with symbols such as \d or \s) |
x* | Find zero or more of x (also works with symbols such as \d or \s) |
x+ | Find one or more of x (also works with symbols such as \d or \s) |
x{3} | Find exactly 3 of x (also works with symbols such as \d or \s) |
x{3,} | Find 3 or more of x (also works with symbols such as \d or \s) |
x{3,6} | Find between 3 and 6 of x (also works with symbols such as \d or \s) |