REGEX_LOG_PARSE

Parses a character string based on Java Regular Expression patterns as defined in java.util.regex.pattern.

Columns are based on match groups defined in the regex-pattern. Each group defines a column, and the groups are processed from left to right. Failure to match produces a NULL value result: If the regular expression does not match the the string passed as the first parameter, NULL is returned.

Syntax

REGEX_LOG_PARSE (sourceString, regexPattern)

*where regexPattern* is a constant string which is a regular expression.

The returned columns will be COLUMN1 through COLUMNn, where n is the number of groups in the regular expression. The columns will be of type char(m) where m is the length of sourceString.

Example 1

The following code returns two columns with zero or more of [0-9] of the string ‘abcde111fghij22klm’

SELECT trim(t.r.COLUMN1) col1, trim(t.r.COLUMN2) col2, trim(t.r.COLUMN3) col3 from
(values (REGEX_LOG_PARSE('abcde111fghij22klm', '([^0-9]*)1*([^0-9]*)2*([^0-9]*)'))) t(r);

which returns the following result:

'COL1','COL2','COL3'
'abcde','fghij','klm'

Example 2

The following code returns three columns:

  • one, labeled “Amount,” with one or more digits
  • one, labeled “Item,” with one or more non-whitespace characters, followed by whitespace, followed by one or more non-whitespace characters
  • one, labeled “Ship Date,” with one or more non-whitespace characters sql SELECT cast(trim(t.r.COLUMN1) as int) "Amount", trim(t.r.COLUMN2) "Item", cast(trim(t.r.COLUMN3) as date) "Ship Date" from ( values ( REGEX_LOG_PARSE ( '445 light bulbs should be in the basket. The order will ship on 2014-05-04.', '^(\d+) (\S+\s\S+) should be in the basket. The order will ship on (\S+).' ) ) ) t(r); which returns the following result: 'Amount','Item','Ship Date' '445','light bulbs','2014-05-04'

Quick Regex Reference

For full details on Regex, see java.util.regex.pattern [xyz] Find single character of: x, y or z [^abc] Find any single character except: x, y, or z [r-z] Find any single character between r-z [r-zR-Z] Find any single character between r-z or R-Z ^ Start of line $ End of line \A Start of string \z End of string . Any single character \s Find any whitespace character \S Find any non-whitespace character \d Find any digit \D Find any non-digit \w Find any word character (letter, number, underscore) \W Find any non-word character \b Find any word boundary (…) Capture everything enclosed (x| y) Find x or y (also works with symbols such as \d or \s) x? Find zero or one of x (also works with symbols such as \d or \s) x* Find zero or more of x (also works with symbols such as \d or \s) x+ Find one or more of x (also works with symbols such as \d or \s) x{3} Find exactly 3 of x (also works with symbols such as \d or \s) x{3,} Find 3 or more of x (also works with symbols such as \d or \s) x{3,6} Find between 3 and 6 of x (also works with symbols such as \d or \s)