URI Parse UDX

The URI Parse UDX parses decorated URIs into name-value pairs. URIParse is called with an input stream and the name of the column containing the decorated URIs. The SQL function declaration’s output rowtype determines which input columns and which parsed named values are placed in the output stream.

  • Parsed values are copied to output columns specified with the same case-sensitive name as the supplied parameter. Output columns whose names do not exactly match any parameter value are discarded.
  • Pass-through input: Each input column whose name exactly matches (case-sensitive) a specified output column name is copied as a “pass-through”.
  • Input columns that do not exactly match output columns are discarded.

The URI Parse UDX is implemented externally in a Java JAR file, located in the SQLSTREAM_HOME/plugin directory. This external JAR file is loaded into SQLstream s-Server using CREATE JAR syntax.

The following options are passed as arguments to the SQL function invocation:

Parameter Description
inputRows The input stream of rows
parseColName (Optional) Name (case-sensitive) of the column to be parsed.If this parameter is not supplied, then the first non-ROWTIME column is used.
querySplit The regular expression to be used for splitting the request and decoration. request refers everything before the first querySplit, usually a question mark, “?”, while decoration refers to everything after the first querySplit, usually a question mark, “?”
paramSplit The regular expression to be used for splitting the param-value strings from decoration; paramSplit is usually the character ampersand, “&”, or semicolon, “;”
pairSplit The regular expression to be used for splitting the param/value pairs from param-value strings.— pairSplit is usually the character equals, “=”, or colon, “:”.
results Name of the output stream of rows

Note: Because the xxxxSplit arguments are used as Java regular expressions (regex), be sure to escape any split character that is also a regex metacharacter. For example, the querySplit is usually a question mark, “?”. Since “?” is the conditional-match regex metacharacter, it must be escaped, “?”, when passed in from the SQL.

Consider a stream named “s” with a single varchar column named “Message” where the contents of the column contain:

http://sqlstream.com/index.jsp?fruit=<value>&id=<value>&state=<value>.

Creating a function to call uriParse

The following code calls the function uriParse by creating a function called “uriParseStream” and then defining what the URI will look like:

CREATE FUNCTION "uriParseStream"(
inputRows CURSOR,
parseColName VARCHAR(32),
querySplit VARCHAR(3),
paramSplit VARCHAR(3),
pairSplit VARCHAR(3))
RETURNS TABLE( "fruit" varchar(16),
"id"int,
"state" varchar(2)
)
LANGUAGE JAVA
PARAMETER STYLE SYSTEM DEFINED JAVA
NO SQL
EXTERNAL NAME 'class com.sqlstream.plugin.uriparse.UriParse.parseUri';

Use the “uriParseStream” UDX

You then call the function through the following code:

SELECT STREAM "fruit", "id" ,"state"
FROM STREAM(test.uriParseStream(
     CURSOR( SELECT STREAM '\?' || MESSAGE AS "message" FROM s), -- input rows
     'message', -- parseColName
     '\?',      -- querySplit
     '\&',      -- paramSplit
     '='))     -- pairSplit.
;

Note: When a table function returns an infinite result, it needs to be wrapped with a STREAM(…) marker when invoked. s-Server will return an error otherwise

You can test the function by opening another terminal window and entering a statement along the following lines:

INSERT INTO test.s (MESSAGE)
VALUES
('http://sqlstream.com/index.jsp?fruit=apple&id=6789&state=NY'),
('http://sqlstream.com/index.jsp?fruit=orange&id=6889&state=CA'),
('http://sqlstream.com/index.jsp?fruit=nectarine&id=6784&state=FL');

which would return columns for a, b, and c:

fruit id state
apple 6789 NJ
pear 6791 TX
nectarine 6784 FL

and so on.