Configuring External Stream Sinks

External Stream sinks make use of s-Server's Extensible Common Data framework. This framework allows you to read and write rows of data in a range of forms over a range of input/output formats, including the file system, network sockets, AMQP, Amazon Kinesis, and Kafka. All data is sent as a string in CSV, XML, or JSON format

Using the File System as a External Stream Sink

To read streaming data over the file system, you need two pieces of information:

  • The directory in which the file resides.
  • A pattern for the file's name. Here, you enter part of the file's name, such as output, csv, or log. No quotation marks are necessary.

| DIRECTORY | Directory in which file resides. |

Using a Network Socket as a External Stream Sink

To read from a line, CSV, XML, or JSON file over a network socket, you need to configure the socket connection. You may want to consult with whoever has set up the application with which StreamLab will connect over the socket. Network sockets initiate a connection over TCP or UDP. Connections can be initiated by remote applications or by StreamLab itself. To tell StreamLab listen to a remote host, use the Remote Host and Remote Port fields. Connections can optionally be made using IPV6.

Name Description
Remote Host Hostname to send rows to or receive rows from. You can override this to 'LOCALHOST' to listen to only local connections, or a specific ip address, such as <>.
Socket uses TCP? Whether the socket uses TCP (True) or UDP (False). Default is false (UDP).
Skip Header True or false; defaults to false. Specifies if the parser should skip the first row of input files. Usually, you would do this if the first row of data contains column headers.

Using MongoDB as a External Stream Sink

Options for Writing to a MongoDB Collection

Option Definition
URL Fully qualified URL starting with *mongodb:// and including, at minimum, a host name (or IP address or UNIX domain socket). URL can also include a username and password (these are passed to the MongoDB instance) and a port number. See for more information.

Using AMQP as a External Stream Sink

To read from a External Stream sink over AMQP, you need to configure the AMQP connection. AMQP stands for Advanced Message Queuing Protocol, and is an Open Standard for Messaging Middleware. For more information, see You may want to consult with whoever has set up AMQP in your environment.

AMQP 0.9.1 vs 1.0

There are distinct differences between the way AMQP up to 0.9.1 works and the way AMQP 1.0 works. Roughly, AMQP 1.0 only provides network wire-level protocol for the exchange of messages, while up to 0.9.1, AMQP employed a system of exchanges, queues and bindings to exchange messages. As a result of this change, you configure StreamLab for AMQP differently for 1.0 than for up to 0.9.1

| Name | Description | | --- | --- | | AMQP URL | Required. Connection URL for AMQP legacy server. This includes the server's hostname, user, password, port and so on. This will differ depending on whether you are using AMQP 1.0 or a legacy version. |

AMQP_LEGACY (AMQP protocol Version 0.9, e.g., RabbitMQ)


amqp://<username>:<password>@<clientid>/<profile>?brokerlist='tcp://<hostname>:<portNumber>'&[ <optionsList> ]



AMQP10 (AMQP protocol Version 1.0) - connectionfactory.localhost:





Single quotes must be doubled for SQL formatting. You may need to do additional URL escaping depending on your AMQP implementation.Single quotes must be doubled for SQL formatting. You may need to do additional URL escaping depending on your AMQP implementation. The site offers an example of formatting a connection URL.|

Option Name Description
Partition Expression You should only use this if DESTINATION includes "{PARTITION}". This should be a dk.brics regular expression, such as *<0-3>.
Acknowledge Mode Optional. Acknowledgment mode that ECDA communicates to the AMQP 1.0 server. Options are AUTO, MANUAL, or NONE; defaults to AUTO. Details on these modes are available at amqp-inbound-ack

Roughly, AUTO asks the container on the AMQP server to acknowledge once message is completely delivered. MANUAL means that delivery tags and channels are provided in message headers, and NONE means no acknowledgments.

Using Amazon Kinesis as a External Stream sink

To read from a External Stream sink over Amazon Kinesis, you need to configure the Amazon Kinesis connection. | Option Name | Description | | --- | --- | | Kinesis Stream Name | Required. Name of Kinesis stream to write to. No default.| | Kinesis Application Name | Identifies client in cloud watch (defaults to sqlstream). | | . Must point to a credential file on the s-Server file system with the following format:

aws_access_key_id = xxx
aws_secret_access_key = yyy

This defaults to blank, which goes to ~/.aws/credentials.

You need to have an AWS profile set up, and a configuration file stored on your system, in order to read from or write to Kinesis. See - Setting Up an AWS Profile Path in the topic - Reading from Kinesis Streams in the SQLstream Integration Guide .

Option Definition
AWS Profile Name Optional. Profile name to use within credentials file. Defaults to *default.
Initial Position LATEST for latest or TRIM_HORIZON for earliest. Defaults to LATEST.
Socket Timeout (defaults to -1) if set will override kinesis socket timeout

Using Kafka as a External Stream Sink

To read from a line, CSV, XML, or JSON file over Kafka, you need to configure the connection to Kafka. Kafka is an open-source, real-time publish-subscribe messaging framework. See for more details. You may want to consult with whoever has set up the Kafka messaging system in your environment.

To connect with Kafka, you need two pieces of information:

  • The name and port of the Kafka broker (this defaults to localhost:9092, but the sink will not work if a Kafka broker is not listening at this location).
  • The Kafka topic name from which you are reading.

The other configuration details below help manage the starting point for reading Kafka topics as well as the amount of data fed to StreamLab.

Foreign Stream Options for Writing to Kafka

Some of the following options, which are passed to the Kafka broker, are described in more detail at Where appropriate, information in this table is drawn from this page.

Options shown in lower case are case sensitive; they must be lowercase and double-quoted. Options shown in upper case may be entered either unquoted in any case, or double-quoted in upper case. So TOPIC, Topic and topic are all valid; "TOPIC" is valid but "topic" is not.

Some options may not apply to all versions of the Kafka plugin. Each option below is marked as 'Kafka10', 'Legacy' or 'All':

  • 'Kafka10' - the option only applies to the Kafka10 plugin and KAFKA10_SERVER
  • 'Legacy' - the option only applies to the legacy adapter for Kafka up to 0.8, and KAFKA_SERVER
  • 'All' - the option applies to both plugin versions.
Option Name Adapter version Description
TOPIC All Kafka topic
"bootstrap.servers" Kafka10 hostname:port[,hostname:port] of the Kafka broker(s). Defaults to localhost:9092. Used for getting metadata (topics, partitions and replicas). Actual socket connections are established using the information from this metadata. Use commas to separate multiple brokers.
SEED_BROKERS Legacy A comma separated list of broker identifiers in the format "<broker_host_name>:<port>". Defaults to "localhost".
"" Legacy hostname:port of the Kafka broker. Defaults to localhost:9092. Used for getting metadata (topics, partitions and replicas). Actual socket connections are established using the information from this metadata. Use commas to separate multiple brokers.
"key.serializer" Kafka10 Names a serializer class for keys. If no class is given, Kafka uses value.serializer.
"key.serializer.class" Legacy Names a serializer class for keys. If no class is given, Kafka uses serializer.class.
"value.serializer" Kafka10 Names a serializer class for values.
"serializer.class" Legacy Names a serializer class for values. The default encoder takes a byte[] and returns the same byte[].
"partitioner.class" All Fully qualified Java classname of Kafka partitioner. Defaults to com.sqlstream.aspen.namespace.kafka.KafkaOutputSink$RoundRobinPartitioner
"compression.type" Kafka10 If desired, specifies the final compression type for a given topic. Defaults to 'none'. Possible values: 'snappy', 'gzip'
"" All Producers refresh metadata to see if a new leader has been elected. This option specifies the amount of time to wait before refreshing.
"" All When request.required.acks is enabled, this lets you specify how long the broker should try to bundle the specified number of messages before sending back an error to the client.
"send.buffer.bytes" All Socket write buffer size.
"" All Using this option, you can specify a string to help you identify the application making calls to the Kafka server.
"" Kafka10 This is the transaction ID used by the KafkaWriter instance for the given foreign stream. Each foreign stream should use a unique to publish messages to the topic using transactions. Transactions are used only if Kafka brokers are v0.11.2. These support transactional semantics.

Note: You need to create a separate foreign stream definition for each pump that inserts (publishes) messages to a given topic. Each of these foreign streams needs to use a unique "" for itself. The foreign stream option "", defined below, needs to match the name of the pump that inserts into the foreign stream.

(new in s-Server version 6.0.1) If you set = 'auto', when a pump begins running, s-Server automatically sets to '<fully_qualified_pump_name>_Pump', where <fully_qualified_pump_name> is the name of the pump that instantiates the sink.
"" Kafka10 Deprecated since version 6.0.1.

Fully qualified name of the pump that will process rows for this foreign stream. You must set in order to use this option. s-Server uses this setting to determine the mode in which the pump instance itself is running (Leader or Follower) when you configure the Kafka adapter to run in High Availability (HA) mode. The needs to be fully qualified pump name of the format:<catalog_name>.<schema_name>.<pump_name>

For example:'LOCALDB.mySchema.ProcessedOrdersPump'
"" All In cases where batch.size has not been reached, number of milliseconds that the Kafka producer will wait before batching sends . Defaults to '100', in milliseconds)
"batch.size" All Number of messages sent as a batch. Defaults to '1000'
"kafka.producer.config" Kafka10 Lets you specify the name of a properties file that contains a set of Kafka producer configurations. For example, you could use such a file to set all the properties needed for a SSL/SASL connection that the producer will invoke. Kafka offers a wide range of config properties.

For details, see Kafka documentation at Note: In some cases, you can use these property files to override Foreign Stream options. For example, the setting for bootstrap.servers will override the Foreign Stream option "bootstrap.servers". This can be helpful in dynamic environments (AWS, Docker, Kubernetes and so on) where you do not know the specific locations of the Kafka brokers until runtime.
"security.protocol" Kafka10 Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL
"" Kafka10 The maximum amount of time in ms that the transaction coordinator will wait for a transaction status update from the producer before proactively aborting the ongoing transaction.
HA_ROLLOVER_TIMEOUT Kafka10 Time in milliseconds. Defaults to 5000. You must set and in order to use this option. When the pump is configured to run in High Availability mode, and the pump is running as a "Follower", it waits for this amount of time for a lack of commits from the "Leader".

After the timeout, the "Follower" attempts to takeover as the "Leader". There may be multiple follower instances running in a cluster. Only one of these followers succeeds to be designated as the new "Leader". All other pump instances using the same continue "following". If the earlier "Leader" was still alive when one of the followers took over as the new "Leader", the earlier leader demotes itself as the "Follower" if possible.
COMMIT_METADATA_COLUMN_NAME Kafka10 Using this option, you can commit a stringified value of the specified column along with its ROWTIME in a CSV format, along with the offset of the last published message for each partition in a transaction. The format of the metadata string is as follows:
<CommitRowtimeInMillisFromEpoch>,<metadata_column_value>. For more details, refer to Building and Using an Index Topic in the Reading from Kafka topic of the s-Server Integration Guide.
TRANSACTION_ROWTIME_LIMIT Kafka10 Range in milliseconds. Allows all rows received from the input query that have ROWTIME values within the specified range to be committed in a single transaction to the Kafka broker. Transactions are used only if the Kafka broker supports transaction semantics. If this option is set to 1000 (1 second), then all rows with ROWTIME between 10:00:00.000 and 10:00:00.999 get committed in the same transaction atomically. See Using Atomic Commitments.
HEADERS_COLUMNS Kafka10 Deprecated - please use HEADERS_COLUMN_LIST
HEADERS_COLUMN_LIST Kafka10 Comma-separated list of foreign stream columns that will be mapped as outgoing headers, rather than to the record value itself. See Writing Headers to Kafka based on Column Data
OPTIONS_QUERY All Lets you query a table to update adapter options at runtime. You can use this, for example, to set the "bootstraop.servers" option from a table , as in select lastOffset as "bootstrap.servers" from TEST.kafka_write_options;
POLL_TIMEOUT Legacy This option specifies the timeout value in milliseconds to be passed as a parameter to the KafkaConsumer.poll() API call. The default is 100ms.
PORT Legacy Deprecated option. From s-Server 6.0.0, port numbers are specified in the SEED_BROKERS option.
"producer.type" Legacy This parameter specifies whether the messages are sent asynchronously in a background thread. Valid values are 'async' (asynchronous) and 'sync' (synchronous). Default 'sync'
"compression.codec" Legacy The compression codec for all data generated by this producer. Valid values are "none", "gzip" and "snappy". Default 'none'.
"compressed.topics" Legacy If the compression codec is anything other than 'none', enable compression only for specified topics if any. If the list of compressed topics is empty, then enable the specified compression codec for all topics.
"message.send.max.retries" Legacy This property will cause the producer to automatically retry a failed send request. This property specifies the number of retries when such failures occur. Default 3.
"" Legacy The producer generally refreshes the topic metadata from brokers when there is a failure (partition missing, leader not available...). It will also poll regularly (default: every 10min so 600000ms). If you set this to a negative value, metadata will only get refreshed on failure. Default 600*1000 (10 minutes).
"request.required.acks" Legacy How many other brokers must have committed the data to their log and acknowledged this to the leader? 0 means never wait; 1 means wait for the leader to acknowledge; -1 means wait for all replicas to acknowledge. Default 0 (never wait).
"" Legacy Maximum time to buffer data when using async mode. Default 5000.
"queue.buffering.max.messages" Legacy The maximum number of unsent messages that can be queued up the producer when using async mode before either the producer must be blocked or data must be dropped. Default 10000.
"" Legacy The amount of time to block before dropping messages when running in async mode and the buffer has reached queue.buffering.max.messages. If set to 0 events will be enqueued immediately or dropped if the queue is full (the producer send call will never block). If set to -1 the producer will block indefinitely and never willingly drop a send. Default -1.
"batch.num.messages" Legacy The number of messages to send in one batch when using async mode. The producer will wait until either this number of messages are ready to send or is reached. Default 200.
"send.buffer.bytes" Legacy Socket write buffer size. Default 100x1024.

Using MQTT as a External Stream Sink

To read data from or write data to MQTT, you need to configure the connection to MQTT. StreamLab uses this information to implement an MQTT client that reads data into the foreign stream. Minimum options required are TOPIC and CONNECTION_URL.

Foreign Stream Options for Reading from MQTT

Option Description
TOPIC MQTT topic. UTF-8 string that the MQTT broker uses to filter messages for each connected client.
CLIENT_ID s-Server implements an MQTT client to connect to the MQTT server. This setting provides a MQTT ClientID for this client. The MQTT broker uses the ClientID to identify the client and its current state. As a result, if used, you should use a distinct CLIENT_ID for each foreign stream defined for MQTT. Defaults to randomly generated.
QOS Defines the guarantee of delivery for a specific message. Either at most once (0), at least once (1), exactly once (2). Defaults to 1. For more information on QOS, see
USERNAME Optional. User name for MQTT connection. Note: s-Server sends user names and passwords as plain text.
PASSWORD Optional. Password for MQTT connection. Note: s-Server sends user names and passwords as plain text.
KEEP_ALIVE_INTERVAL Optional. Interval in seconds sent to MQTT broker when s-Server establishes a connection. Specifies the longest time period of time that broker and client persist without sending a message. Defaults to 60.
CONNECTION_TIMEOUT Optional. Connection timeout in seconds. Defines the maximum time interval the client will wait for the network

connection to the MQTT server to be established. If you set this value to 0, s-Server disables timeout processing, and the client will wait until the network connection is made successfully or fails. Defaults to 30. | | RETAINED | Output only. True or False. If set to true, tells broker to store the last retained message and the QOS for this topic. Defaults to false. | | MAX_IN_FLIGHT | Output only. When QOS is set to 1 or 2, this is the Maximum number of outgoing messages that can be in the process of being transmitted at the same time. This number includes messages currently going in handshakes and messages being retried. Defaults to 10. |