CREATE FUNCTION

SQLstream provides a number of built-in functions, and also allows users to extend s-Server’s capabilities by means of user-defined functions (UDFs) and user-defined transformations (UDXs). You define both using CREATE FUNCTION. Functions must be declared within schemas, but can be added to a SQL path.

Values passed to (or returned from) a user-defined function or transformation must be exactly the same data types as the corresponding parameter definitions. In other words, implicit casting is not allowed in passing parameters to (or returning values from) a user-defined function or a user-defined transform.

User-defined functions and transformations may be invoked using either the fully-qualified name <schema>.<function_name> or by the function name alone if it exists in the current SQL path (see SET PATH and CURRENT_PATH).

This topic contains information on the following subtopics:

SQLstream functions can be implemented as pure SQL, or can reference external functions defined in Java, Python or C+ (native) modules. The table below shows which types of function are supported:

Type of Function SQL Java Python C++
User Defined Function (UDF) Yes Yes No Yes
User Defined Transform (UDX) No Yes No No

Creating User-Defined Functions (UDFs)

A user-defined function can implement complex calculations or interact with an external system, taking zero or more scalar parameters and returning a scalar result. UDFs operate like built-in functions such as FLOOR() or LOWER(). For each occurrence of a user-defined function within a SQL statement, that UDF is called once per row with scalar parameters (constants or column values in that row).

User-defined functions may be defined either directly in SQL, or externally with Java classes.

Note: any Java class referenced by a UDF must be explicitly defined as part of a package (using a package clause in the source file for the Java class).

SQL UDFs

The syntax for a SQL user-defined function is:

CREATE FUNCTION <function_name> ( <parameter_list> )
RETURNS <data type>
LANGUAGE SQL
[ SPECIFIC <specific_function_name> ]
[ [NOT] DETERMINISTIC ]
CONTAINS SQL | READS SQL DATA | MODIFIES SQL DATA
[ RETURNS NULL ON NULL INPUT | CALLED ON NULL INPUT ]
RETURN <SQL-defined function body>
  • SPECIFIC assigns a specific function name that is unique within the database. Note that the regular function name does not need to be unique (two or more functions may share the same name, as long as they are distinguishable by their parameter list). The specific name may not be used to invoke the function, but is useful when dropping a function (see DROP FUNCTION when the regular function name is not unique.
  • DETERMINISTIC / NOT DETERMINISTIC indicates whether a function will always return the same result for a given set of parameter values. This may be used internally in query optimization.
  • READS SQL DATA and MODIFIES SQL DATA indicate whether the function potentially reads or modifies SQL data, respectively. If a function attempts to read data from tables, streams, or views without READS SQL DATA being specified, or insert to a stream or modify a table without MODIFIES SQL DATA being specified, an exception will be raised.
  • RETURNS NULL ON NULL INPUT and CALLED ON NULL INPUT indicate whether the function is defined as returning null if any of its parameters are null. If left unspecified, the default is CALLED ON NULL INPUT.

A SQL-defined function body consists only of a single RETURN statement. SQLstream only supports SQL functions that return a single scalar result; there is no support for UDXs written using SQL.

First, we create a new schema to handle our extensions and set the schema and path to point to it:

CREATE SCHEMA geolocate_sql;
SET SCHEMA 'geolocate_sql';
SET PATH 'geolocate_sql';

Next, we create a function that allows us to extract the fractional portion of a degree measurement. Here we can see that the body of a SQL UDF is always RETURN <expression>:

CREATE FUNCTION get_fraction( degrees DOUBLE )
    RETURNS DOUBLE
    CONTAINS SQL
    RETURN degrees - FLOOR(degrees)
;

Next, we use get_fraction to build a UDF that extracts minutes from a degree measurement. This shows how one UDF can call another:

CREATE FUNCTION get_minutes( degrees DOUBLE )
    RETURNS INTEGER
    CONTAINS SQL
    RETURN CAST( (get_fraction(degrees) * 60.0) as INTEGER )
;

With both of these functions in hand, we can build yet another UDF to extract seconds:

CREATE FUNCTION get_seconds( degrees DOUBLE )
    RETURNS INTEGER
    CONTAINS SQL
    RETURN CAST( ((get_fraction(degrees) * 3600.0) - ((get_minutes(degrees) * 60)) )
                  as INTEGER )
;

Now that we have our extraction functions in place, let’s create one more function that brings them all together to convert from degrees in decimal format to a fully specified string indicating degrees, minutes, and seconds:

CREATE FUNCTION to_dms( degrees DOUBLE )
    RETURNS VARCHAR(20)
    CONTAINS SQL
    RETURN
        ( CAST( CAST(degrees AS INTEGER) AS VARCHAR(20) ) ||
        'd ' ||
        CAST( get_minutes(degrees) AS VARCHAR(20) ) ||
        'm ' ||
        CAST( get_seconds(degrees) AS VARCHAR(20) ) ||
        's' )
;

Java User-Defined Functions

s-Server supportd the SQL/OLB standard for User-Defined Functions (UDFs), also known as “external routines”. Farrago supports this standard; please refer to the External Routines section at http://farrago.sourceforge.net/design/UserDefinedTypesAndRoutines.html.

The syntax for Java UDF is:

CREATE FUNCTION <function_name> ( <parameter_list> )
RETURNS <built-in scalar type>
LANGUAGE JAVA
PARAMETER STYLE [SYSTEM DEFINED] JAVA
[ SPECIFIC <specific_function_name> ]
[ [NOT] DETERMINISTIC ]
NO SQL
[ RETURNS NULL ON NULL INPUT | CALLED ON NULL INPUT ]
EXTERNAL NAME ['class <fully qualified static method name>'
| '<qualified jar name>:<fully qualified static method name>']

Any parameters to a Java-defined external function must be specified using built-in data types. While the Java function can return a value, changes to any values passed to the function as parameters only exist internal to the Java function. They are not passed back to the calling SQL.

  • SPECIFIC assigns a specific function name that is unique within the database. Note that the regular function name does not need to be unique (two or more functions may share the same name, as long as they are distinguishable by their parameter list). The specific name may not be used to invoke the function, but is useful when dropping a function (see DROP FUNCTION when the regular function name is not unique.
  • DETERMINISTIC / NOT DETERMINISTIC indicates whether a function will always return the same result for a given set of parameter values. This may be used internally in query optimization.
  • RETURNS NULL ON NULL INPUT and CALLED ON NULL INPUT indicate whether the function is defined as returning null if any of its parameters are null. If left unspecified, the default is CALLED ON NULL INPUT.
  • EXTERNAL NAME is used to define where SQLstream s-Server should look for the Java class implementing the function. Java-defined external functions must be located either in a class file on SQLstream s-Server’s Java classpath, or in an external Java archive (JAR) file loaded into the system using CREATE JAR. In the latter case, the qualified jar name is the name given to the jar as part of CREATE FUNCTION. If the jar name was not defined in the current schema, then the fully qualified . format must be used.

The Java class containing the method must live inside a package.

The following EXTERNAL NAME clauses will work:

EXTERNAL NAME 'class org.apache.commons.lang.StringUtils.reverse'  -- as long as the class is in the classpath

EXTERNAL NAME '"MyJarName":foo.bar.VarbinaryFunctions.toUTF8String' -- as long as the JAR is in the current schema

EXTERNAL NAME '"FOO_SCHEMA"."MyJarName:foo.bar.VarbinaryFunctions.toUTF8String'

but the following declaration will raise an error when you try to invoke the function:

EXTERNAL NAME '"FOO_SCHEMA":VarbinaryFunctions.toUTF8String'   -- invalid because there is no package name included

Example of Java User-Defined Functions

CREATE JAR "getFraction"
    LIBRARY 'file:/home/aspen/getFraction.jar'
    OPTIONS(0);

-- This code loads the installed Java JAR file

CREATE FUNCTION get_fraction(degrees DOUBLE )
RETURNS DOUBLE
LANGUAGE JAVA
NO SQL
EXTERNAL NAME 'getFraction:com.sqlstream.examples.geolocate.Degree.getFraction'
;

Note that the above examples only cover the SQL statement CREATE FUNCTION for these functions. To fully define a Java UDF, you need to create and define a Java class (such as com.sqlstream.examples.geolocate.Degree) using CREATE JAR. See the topic GeoLocation: An Example of User-Defined Functions in the s-Server Integration Guide for full examples covering the entire process.

C++ UDFs

The syntax for a creating a C UDF is:

CREATE FUNCTION <function_name> ( <parameter_list> )
RETURNS <data type>
LANGUAGE C
PARAMETER STYLE GENERAL
NOSQL
EXTERNAL NAME '<sharedObjectLibrary>:<external name of method>';

For example:

create or replace function xor(i boolean, j boolean) 
returns boolean 
language c 
parameter style general 
no sql 
external name 'plugin/libsampleUdfs.so:xorOp'; 

The <sharedObjectLibrary> can be relative to the s-Server working directory (usually $SQLSTREAM_HOME) or absolute. The <external name of method> is defined in C++ code. To read more about how to create and reference shared library methods in C++ please see Writing a C++ UDF.

Creating User-Defined Transformations (UDXs)

User Defined Transforms (UDXs) are simply User Defined Functions (UDFs) that either return a table / finite relation or they return a stream / streaming relation instead of simply returning a scalar value.

A user-defined transform (UDX) is an advanced type of user-defined function that reads a set or stream of rows and returns a set or a stream of rows. Its input arguments can be scalars or cursors. A cursor is an object representing a query, from which the UDX can read the result of upstream processing. The query can be relational (finite) or streaming (infinite).

Although UDXs are registered in SQLstream using the CREATE FUNCTION SQL statement, SQLstream does not support the body (the procedural logic) of a UDX being specified in SQL. It has to be defined in an external language. SQLstream supports UDXs written in Java, Python or C++; development of custom UDXs requires knowledge of the UDX framework as it applies to each of these languages. So this topic contains information on the following:

A UDX is declared by a sql CREATE FUNCTION statement, much like a user-defined function.

The salient differences are:

  • A scalar input is declared as a parameter with a scalar type, just as for a UDF.
  • A subquery input, whether relational or streaming, is declared using the CURSOR type.
  • The rowtype of a cursor is not stated. Indeed, the rowtype of a cursor is dynamic, and is bound when the UDX is applied to a query.
  • In practical use, a UDX is written in Java to expect specific rowtypes. There is no way to state this constraint in the DDL.
  • The output of a UDX is a virtual table or stream. The output columns and types are specified in a RETURNS TABLE clause.
  • When a UDX has an input CURSOR c, the output rowtype can use c.* to mean all the columns from that input, in the same order as in the catalog.
  • A UDX can take any number of relational inputs, and can process their rows in any order (fetching all of one first, and then moving on to the next, or interleaving fetches). Of course if the inputs are streams the UDX should make best efforts to poll each of the streams concurrently to avoid losing data.
  • UDX output is not tied to input processing. A UDX can emit rows as needed - either many outputs per input, or one output per many inputs.

A UDX may be implemented by a Java class (which must be located in a java package), by a Python module, or by a C shared library object.

UDXs, Streams, and ROWTIMEs

A stream always has a ROWTIME column. Therefore, a cursor based on a streaming query will have a ROWTIME, but a cursor based on a relational query will not. As a result, the SQL that calls the UDX differs:

   CURSOR(SELECT STREAM....) has a rowtime
   CURSOR(SELECT....) has no rowtime.

The declaration of a UDX states its output rowtype, although with the “c.*” construct the output can depend dynamically on the input type. The rowtype of an input cursor is always dynamic.

A UDX is required to set the value of every output column, matching its declared data type. ROWTIME is special:

  • It occurs only in streaming context;
  • It is never null and never decreases;
  • The UDX may set it (often by copying the ROWTIME from an input cursor); OR
  • The UDX may skip it, which means the system will set it to the current stream time.

Java UDXs

The syntax for creating a Java UDX is:

CREATE FUNCTION <function_name> ( <parameter_list> )
RETURNS ( <result columns> )
LANGUAGE JAVA
PARAMETER STYLE [SYSTEM DEFINED] JAVA
[ SPECIFIC <specific_function_name> ]
NO SQL
EXTERNAL NAME ['class <fully qualified static method name>'
| '<qualified jar name>:<fully qualified static method name>']

This example shows the declaration and use of a UDX that has three inputs: one scalar integer value and two query expressions. Also note that in this example the output is fixed at 3 columns.

CREATE OR REPLACE FUNCTION crossRef(bias INTEGER, c1 CURSOR, c2 CURSOR)
 RETURNS TABLE(leftVal INTEGER, rightVal INTEGER, position DECIMAL(9,2))
 LANGUAGE JAVA
 PARAMETER STYLE SYSTEM DEFINED JAVA
 NO SQL
 EXTERNAL NAME 'class com.sqlstream.udxsample.UdxSample.crossRef';

And the UDX is referenced like this:

SELECT STREAM res.position
FROM TABLE(crossRef( 33, CURSOR(SELECT STREAM * FROM transmitStream),
CURSOR(SELECT STREAM * FROM recieveStream))) AS res;

For more examples and information about how to build the Java methods that implement UDXs please see Writing a Java UDX.