Transforming Data

This section lists some pre-defined User Defined Functions (UDFs) and User Defined Transforms (UDX), and then introduces how to build and use such routines.

Implementing UDFs and UDXs

The main way to transform data in s-Server is through SQL. See the SQL Reference guide for more detail. For more sophisticated transformations, you can define custom User Defined Transformations and User Defined Functions. This section discusses the creation of these.

UDXs operate in the middle of processing a stream. They use the JDBC API to exchange data with SQLstream s-Server while stream processing is underway. You call a UDX with one or more streaming arguments, and the UDX returns a streaming result. To use that resulting stream, downstream operations or listeners must SELECT from it. (Code sections appear in the topic Writing an ECD Plugin).

For a full UDX code example, see the topic Table Lookup UDX in this guide.

UDXs are similar to SQL functions (normal UDFs). The difference is that UDXs read data from a streaming query and return streaming results, whereas UDFs return scalar results.

UDFs and UDXs provided with s-Server

UDF / UDX Description
AesEncryptDecrypt UDX Performs Advanced Encryption Standard (AES) Encryption compatible with the MySQL AES_ENCRYPT and AES_DECRYPT functions.
AvroFormatterUDX Format data to AVRO.
GeoIPFunctions UDF Lets you determine the country, region, city, latitude, and longitude associated with IP addresses worldwide, along with the postal code, and metro code for US IP addresses. It does so using the MaxMind GeoIp City database. See (https://www.maxmind.com/en/geoip2-city) for more details.
GoogleGeoCode UDF Uses the Google Map webservice (either the Google GeoCode API or Google Maps for Business Users) to determine the longitude and latitude of an address, or look up the nearest address using a longitude and latitude. This process is known as “geocoding”.
Group Rank UDX. Sorts groups of rows. (GroupRank.jar)
Kalman Filter UDX Allows customers to run Kalman filters on streams of sensor data. A Kalman filter is a technique for sharpening the measurements produced by blurry sensors.
Linear Interpolation UDX
Quadratic Interpolation UDX
These allow you to reconstruct missing rows in a stream.
Matched Filter UDX Allows you to evaluate a template against a signal, giving a correlation coefficient for how close the match was at any point within the signal.
Parser UDX Lets you call s-Server's predefined parsers in a function.
Table Lookup UDX Looks up external DB entries which match one or more stream columns (stream-table JOIN) (TableLookup.jar)
Throttle UDX Allows you to slow down data flows; this can be very helpful during development and testing to allow a developer to monitor data as it flows through s-Server.
URI Parse UDX Parses decorated URI into name=value parameter pairs, streams values in named columns. (UriParse.jar)
XML Parse UDX Parses XML column values into name=value pairs using XPath, streams values in named columns (XmlParse.jar).

See also the UDF/UDX descriptions, discussions, and examples in the topic SQLstream Software Development Kit (SDK) in the Integration Guide.

User-defined Routines and Prepared Statements

When a query using one or more user-defined routines (UDFs, UDXs, etc.) will be executed many times, it should be made inside a PreparedStatement. This advice applies as well to any metadata query and to non-expedited Inserts. Multiple executions without using a PreparedStatement raises the possibility of leaking a class in PermGen space. Using a PreparedStatement for such a query thus minimizes the need to enlarge PermGen space, which is set by default at 64MB but can be changed using the

-xx MaxPermSize=<size>

flag on the start-server command, where can be 64m, 128m, 256m, etc. The "m" is necessary. See also the topic JDBC Driver in this guide for discussions and links pertaining to SQLstream's StreamingPreparedStatement and StreamingStatement extensions to the standard PreparedStatement interface.

Control Streams

The TableLookup UDX implements stream-table join by looking up entries in an external table that match one or more key columns in the input stream. TableLookup supports a control stream with commands enabling you to suspend the stream-table join, flush the cache, and then resume the stream-table join. After resuming, the cache is repopulated with fresh values from the lookup table, some of which may have changed from the earlier state of the cache.