Using the SQLstream JDBC driver, you can federate (link together) multiple instances of s-Server with fine-grained topology. You do so using DDL similar to that used for a SQL/MED connection to other databases. See the topic Reading Data from RDBMS Sources in the Integrating Guavus SQLstream with Other Systems guide for more details.
Once you federate multiple instances of s-Server, you can query from and insert into streams and tables in the federated instances of s-Server. This means that you can perform pipeline operations such as parsing, stream enrichment through stream-table joins, filtering and aggregation over these federated instances of s-Server. These operations are described in the section Automatic Distribution of Streaming Workloads.
Federation allows for complex topologies declared entirely in SQL. See the section Scale-up & Scale-Out of Streaming Applications.
Setting up federation requires three steps:
To install the JDBC driver, you create a foreign data wrapper along the following lines:
CREATE OR REPLACE FOREIGN DATA WRAPPER CLIENT_JDBC LIBRARY 'class com.sqlstream.aspen.namespace.jdbc.AspenMedJdbcForeignDataWrapper' LANGUAGE JAVA OPTIONS (DRIVER_CLASS 'com.sqlstream.jdbc.Driver');
For more detail, see the topic CREATE FOREIGN DATA WRAPPER in the Streaming SQL Reference Guide.
To create a federation server, you create a data server using the same options as a MED/JDBC foreign data server. See the subtopic Foreign Server Definition in this guide for details on the options defined below.
CREATE OR REPLACE SERVER "my-federation-server" FOREIGN DATA WRAPPER CLIENT_JDBC OPTIONS( URL 'jdbc:vjdbc:sdp://remote-host:5570', USER_NAME 'sa', PASSWORD '', SCHEMA_NAME 'remote-schema-name' );
For more detail, see the topic CREATE SERVER in the Streaming SQL Reference Guide.
In order to access data from the federated server, you need to create a special kind of stream called a foreign stream. Options in the stream specify the file format, character separators, whether or not a header should be written, and so on, as well as any options specific to the format type.
Like all streams, foreign streams must be created within a schema. The example below creates and sets a schema called “FederationData”, and creates a foreign stream called “my-federation-stream”.
As of version 5.2, you need to specify the name of the remote stream with the “STREAM_NAME” option.
In addition to a column list, this foreign stream sets two options:
CREATE OR REPLACE SCHEMA "FederationData"; SET SCHEMA '"FederationData"'; CREATE OR REPLACE FOREIGN STREAM "my-federation-stream" ("recNo" INTEGER, "ts" TIMESTAMP NOT NULL, "accountNumber" INTEGER, "loginSuccessful" BOOLEAN, "sourceIP" VARCHAR(32), "destIP" VARCHAR(32), "customerId" INTEGER,) SERVER "my-federation-server" --set options for foreign s-Server OPTIONS ("SCHEMA_NAME" 'remote-schema-name', "STREAM_NAME" 'remote-stream');
Once you have set up the foreign data wrapper and server above, you query streams and tables in the foreign s-Server as if they were on the local server, using the SCHEMA_NAME as a qualifier, as in the following code:
SELECT STREAM * FROM "FederationData"."my-federation-stream";
For the code above, this data will be pulled from “remote-schema-name”.“remote-stream” on the remote s-Server by the foreign data wrapper.
Note: You cannot insert into a stream on the remote s-Server. You also cannot create or drop streams and tables in the remote schema.
This section describes how streaming pipelines such as parsing and stream enrichment through stream-table joins, filtering, and, in particular, aggregations, can be distributed across federated s-Server instances in a cluster of compute nodes.
You can use message queue middlewares such as Apache Kafka or AWS Kinesis to distribute raw payloads across multiple s-Server instances. This allows you to perform those analytics in a distributed manner. For more information on using Kafka and Kinesis with s-Server, see the topics Reading from Kafka, Writing to Kafka, Reading from Kinesis, and Writing to Kinesis
Streaming transformations such as parsing and enrichment tend to be stateless. Such transformations can be distributed across nodes easily.
Distributing stateful transformations such as aggregations is slightly more difficult, but still straightforward. The following section describe techniques that you can use to distribute aggregations across federated s-Server instances.
An application may aggregate data by either
You can easily distribute these aggregation workloads across multiple nodes in a cluster using the simple SQL scripts described in the following sections.
For tumbling windows, you can use rollup aggregation, partitioned parallel aggregation, or both. You can do a parallelizing GROUP BY operation in two ways:
For sliding windows, you need to use Partitioned Parallel Aggregation.
The following represents aggregating four instances of s-Server into one RollupAggView on another instance of s-Server.
In partitioned parallel aggregation, you aggregate data from the same partitions on four different nodes. The diagram below illustrates views on each node’s partition being aggregated into an aggregated view of partitions on all four nodes. For each node, the views named Partition1View, for example, are aggregated into a view called PartitionUnion1View. In the last step, the four aggregated views are combined using a UNION ALL statement into AllPartitionsView.
The diagram below shows how a streaming application can be distributed across multiple nodes in a cluster. The distribution of queries across nodes achieves the following objectives: