Writing a C++ UDF

This topic contains information on the following:

SQLstream’s support for C++ UDFs may provide better performance than UDFs written in Java. The C++ framework also allows you to benefit from many pre-existing native libraries - for example in the fields of Mathematics, Statistics, Geographics and more.

Using C++ UDFs, the SQLstream user can define a functor (class) that takes a result register and a few parameters as input. It also provides the operator() method and the INSTALL_UDF macro which allows the developer to create the appropriate calculator instruction that invokes that functor and writes the generated output to the Result Register. The functor can have field variables, if it needs to cache helper variables or maintain storage for returned array types.

Prerequisites

Please ensure that the SQLstream C++ SDK has been installed. See Installing the C++ SDK.

Creating a C++ UDF

The following examples represent the method of creating the C++ component for these UDFs. You can find all this codein the sampleUdfs.cpp file.

All these examples and more are contained in a single C++ file sampleUdfs.cpp, which can be found in the tarball above.

Include / using

At the top of each C++ UDF module, include at least these lines

#include "sqlstream/Udf.h" 
using namespace sqlstream; 
using fennel::SqlState; 

Simple UDF implementing the boolean XOR operator

Now define a functor for each UDF. There may be many functors in a single .cpp file.

// simple UDF that takes two booleans and returns a boolean
class XorOp {
public:
    inline void operator()(CalculatorContext context, ResultRegister<bool> &resultReg, bool x, bool y) {
        resultReg = x ^ y;
    }
};

INSTALL_UDF(xorOp, XorOp) 

NOTE:

  • The INSTALL_UDF macro is used to export the UDF, so that it can be installed through SQL.
  • Parameters used are:
    • <external name to use in sql CREATE FUNCTION>
    • <name of class>.
  • Valid result and parameter types and their SQL equivalents are:
    • int8_t (TINYINT)
    • int16_t (SMALLINT)
    • int32_t (INTEGER)
    • int64_t (BIGINT,DECIMAL or TIMESTAMP)
    • double (FLOAT or DOUBLE)
    • bool (BOOLEAN),char_t (CHAR)
    • varchar_t (VARCHAR)
    • binary_t (BINARY)
    • varbinary_t(VARBINARY)
  • For CHAR, VARCHAR, BINARY and VARBINARY, SQLstream supports “reference”, which returns a value without copying it. While using refrence, ensure that no transient data is passed as an input. It should only be used for persistant data. Alternatively, data can be maintained within field of a class using “=” operator.

Templated UDF that supports multiple types

template<typename TYPE>
class AddOp {
    // uses gcc builtin to return true on overflow
    bool doAdd(TYPE x, TYPE y, TYPE &result) {
        if constexpr(std::is_integral<TYPE>::value) {
            return __builtin_add_overflow(x, y, &result);
        } else {
            // otherwise should be DOUBLE
            result = x + y;
            return std::isnan(result);
        }
    }
public:
    inline void operator()(CalculatorContext context, ResultRegister<TYPE> &resultReg, TYPE x, TYPE y) {
        TYPE result;
        if (doAdd(x, y, result)) {
            // log to trace stream
            context.logInfo("this is an info message zzzzzz");
            context.logWarning("this is a warning message zzzzzz");
            context.logSevere("this is a severe message zzzzzz");
            // throw an exception
            context.throwException(SqlState::instance().code22003());
        } else {
            resultReg = result;
        }
    }
};
// Install once for each type supported
INSTALL_UDF(addLongs, AddOp<int64_t>)
INSTALL_UDF(addInts, AddOp<int32_t>)
INSTALL_UDF(addShorts, AddOp<int16_t>)
INSTALL_UDF(addTinys, AddOp<int8_t>)
INSTALL_UDF(addDoubles, AddOp<double>)

NOTES:

  • In this example we have used INSTALL_UDF 5 times to export type-specific versions of the UDF.
  • If there is an overflow, you will see log messages in the /var/log/sqlstream/Trace.log.0 file or ALL_TRACE stream.

Returning NULL explicitly

class NegateLongOrNullOp {
public:
    inline void operator()(CalculatorContext context, ResultRegister<int64_t> &result, int64_t x) {
        if (x == 0) {
            result.toNull(); // this is how you explicitly return NULL - result will always be NULL if any arguments are NULL
        } else {
            result = -x;
        }
    }
};
INSTALL_UDF(negateLong, NegateLongOrNullOp)

Concatenate two strings and return VARCHAR

class ConcatOp {
    // variables declared as fields persist between calls.
    std::string temp;
public:
    inline void operator()(CalculatorContext context, ResultRegister<varchar_t> &result, varchar_t x, varchar_t y) {
        temp.assign(x.data, x.size);
        temp.append(y.data, y.size);
        // For CHAR, VARCHAR, BINARY and VARBINARY, we support "reference",
        // which returns a value without copying it. Note, should only pass data
        // that pas been passed in or data declared as fields.
        // otherwise use the "=" operator.
        result.reference(temp.data(), temp.size());
    }
};
INSTALL_UDF(concat, ConcatOp)

Compiling the C++ UDF shared object library

To build the example sampleUdfs.cpp use the script supplied in the tarball:

./build.sh

This will create build/plugin/libsampleUdfs.so

Installing a C++ UDF

After creating a library plugin, the corresponding SQL function would be created as follows. These examples are from the install.sql script in the SDK tarball, and they assume that your sampleUdfs.so object library has been copied (deployed) to the plugin folder under $SQLSTREAM_HOME. You may choose any convenient location either relative to s-Server’s working directory (usually $SQLSTREAM_HOME) or use an absolute path. Then invoke sqlline –run=install.sql (or whatever your install script is called).

Install XorOp

CREATE OR REPLACE SCHEMA sampleUdf; 
SET SCHEMA 'sampleUdf'; 
SET PATH 'sampleUdf'; 

create or replace function xor(i boolean, j boolean) 
returns boolean 
language c 
parameter style general 
no sql 
external name 'plugin/libsampleUdfs.so:xorOp'; 

The SQL function name xor is mapped to the xorOp C++ functor.

Install AddOp

create or replace function addLongs(i bigint, j bigint) 
returns bigint 
language c 
parameter style general 
no sql 
external name 'plugin/libsampleUdfs.so:addLongs'; 

Calling a C++ UDF

Call each function SQL code like this:

SELECT STREAM a, b, add(A,B) as "Total" FROM myStream S; 

Input

A B
21 45
12 73

Output

A B Total
21 45 66
12 73 85