VARIABLE_COLUMN_LOG_PARSE

VARIABLE_COLUMN_LOG_PARSE splits an input string (its first argument, ) into fields separated by a delimiter character or delimiter string. Thus it handles comma-separated values or tab-separated values. It can be combined with FIXED_COLUMN_LOG_PARSE to handle something like maillog, where some fields are fixed-length and others are variable-length.

VARIABLE_COLUMN_LOG_PARSE is deprecated

VARIABLE_COLUMN_LOG_PARSE is deprecated because:

  • When a parsing error is encountered, VARIABLE_COLUMN_LOG_PARSE drops the row and does not retain the fields. If you want to preserve the row and parse the well-formatted fields it is always recommended to invoke the mid-pipeline parser.
  • VARIABLE_COLUMN_LOG_PARSE only supports CSV style parsing.

Instead of using VARIABLE_COLUMN_LOG_PARSE we recommend using the ECD Parser UDX which supports mid-pipeline parsing using any of the ECD parsers (including CSV, JSON, Key Value, XML).

VARIABLE_COLUMN_LOG_PARSE Syntax

VARIABLE_COLUMN_LOG_PARSE(
 <character-expression>, <columns>, <delimiter-string>
 [ , <escape-string>, <quote-string> ] )
 <columns> := <number of columns> | <list of columns>
 <number of columns> := <numeric value expression>
 <list of columns> := '<column description>[, ...]'
 <column description> := <identifier> TYPE <data type> [ NOT NULL ]
 <delimiter string> := <character-expression>
 <escape-string> := <character-expression>
 <quote-string> := '<begin quote character> [ <end quote character> ]')

Note: Parsing of binary files is not supported.

The arguments <escape-string> and <quote-string> are optional. Specifying an <escape-string> allows the value of a field to contain an embedded delimiter. As a simple example, if the specified a comma, and the <escape-string> specified a backslash, then an input of “a,b' would be split into two fields “a” and “b”, but an input of “a,b” would result in a single field “a,b”.

Since SQLstream supports Unicode character literals, a tab can also be a delimiter, specified using a Unicode escape, such as u&'\0009', which is a string consisting only of a tab character.

Specifying a <quote-string> is another way to hide an embedded delimiter. The <quote-string> should be a one or two character expression: the first is used as the <begin quote character> character; the second, if present, is used as the character. If only one character is supplied, it is used as both to begin and to end quoted strings. When the input includes a quoted string, that is, a string enclosed in the characters specified as <quote-string>, then that string appears in one field, even if it contains a delimiter.

Note that the <begin quote character> and <end quote character> are single characters and can be different. The <begin quote character> can be used to start and end the quoted string, or the can start the quoted string and the used to end that quoted string.

When a list of columns <list of columns> is supplied as the second parameter <columns>, the column specifications (<column description>) for types DATE, TIME, and TIMESTAMP support a format parameter allowing the user to specify exact time component layout. The parser uses the Java class java.lang.SimpleDateFormat to parse the strings for those types. The Date and Time Patterns topic gives a full description of timestamp format strings, with examples.

The following is an example of a column definition with a format string:

"name" TYPE TIMESTAMP 'dd/MMM/yyyy:HH:mm:ss'

By default, the output columns are named COLUMN1, COLUMN2, COLUMN3, etc., each of SQL data type VARCHAR(1024)

See also REGEX_LOG_PARSE.