REGEX_LOG_PARSE

Parses a character string based on Java Regular Expression patterns as defined in java.util.regex.pattern.

Columns are based on match groups defined in the regex-pattern. Each group defines a column, and the groups are processed from left to right. Failure to match produces a NULL value result: If the regular expression does not match the the string passed as the first parameter, NULL is returned.

Syntax

REGEX_LOG_PARSE (sourceString, regexPattern)

Where regexPattern is a constant string which is a regular expression.

The returned columns will be COLUMN1 through COLUMNn, where n is the number of groups in the regular expression. The columns will be of type char(m) where m is the length of sourceString.

Example 1

The following code returns three extracts from the string 'abcde111fghij22klm'; one for each capture group:

SELECT trim(t.r.COLUMN1) col1, trim(t.r.COLUMN2) col2, trim(t.r.COLUMN3) col3 from
(values (REGEX_LOG_PARSE('abcde111fghij22klm', '([^0-9]*)1*([^0-9]*)2*([^0-9]*)'))) t(r);

Returning the following result:

'COL1','COL2','COL3'
'abcde','fghij','klm'

Example 2

The following code returns three columns:

  • one, labeled "Amount", with one or more digits
  • one, labeled "Item", with one or more non-whitespace characters, followed by whitespace, followed by one or more non-whitespace characters
  • one, labeled "Ship Date", with one or more non-whitespace characters SELECT cast(trim(t.r.COLUMN1) as int) "Amount", trim(t.r.COLUMN2) "Item", cast(trim(t.r.COLUMN3) as date) "Ship Date" from ( values ( REGEX_LOG_PARSE ( '445 light bulbs should be in the basket. The order will ship on 2014-05-04.', '^(\d+) (\S+\s\S+) should be in the basket. The order will ship on (\S+).' ) ) ) t(r); which returns the following result: 'Amount','Item','Ship Date' '445','light bulbs','2014-05-04'

Quick Regex Reference

For full details on Regex, see java.util.regex.pattern

Pattern Description
[xyz] Find single character of: x, y or z
[^abc] Find any single character except: x, y, or z
[r-z] Find any single character between r-z
[r-zR-Z] Find any single character between r-z or R-Z
^ Start of line
$ End of line
\A Start of string
\z End of string
. Any single character
\s Find any whitespace character
\S Find any non-whitespace character
\d Find any digit
\D Find any non-digit
\w Find any word character (letter, number, underscore)
\W Find any non-word character
\b Find any word boundary
(...) Capture everything enclosed
(x|y) Find x or y (also works with symbols such as \d or \s)
x? Find zero or one of x (also works with symbols such as \d or \s)
x* Find zero or more of x (also works with symbols such as \d or \s)
x+ Find one or more of x (also works with symbols such as \d or \s)
x{3} Find exactly 3 of x (also works with symbols such as \d or \s)
x{3,} Find 3 or more of x (also works with symbols such as \d or \s)
x{3,6} Find between 3 and 6 of x (also works with symbols such as \d or \s)