Adding an External Stream Source

Streaming Data sources make use of s-Server's Extensible Common Data framework. This framework allows you to read and write rows of data in a range of forms over a range of input/output formats, including

For more details on these specific sources, see Configuring External Stream Sources.

Adding a Streaming Data Source

To add a Streaming Data source:

  1. On the Sources page, drag a Streaming Data source from the left column into the center area.
  2. Click the new Streaming Data source.
  3. Select what type of external stream you are reading from: File System, HTTP, Websocket, Socket, AMQP, Kafka, Kinesis, MQTT, or Teradata Listener.
  4. Enter connection information for the input source. For example, to access a File source, you need to enter directory and filename pattern information for the file.
    • By default, StreamLab uses the project schema for the new source.
    • If you wish to use a different schema, click the dropdown menu to the right of Schema.Stream. You can also choose a different name for the stream by clicking the dropdown menu that reads "data_1".
  5. Click the Discover Format button. This feature examines the file to determine its file format. Currently, the Discovery parser can identify CSV, XML, JSON, and Avro files. StreamLab can also work with ProtoBuf files, but you need to add these as their own source. Avro files may require additional configuration to work.

    The Discover Format dialog box opens. You can select an amount for the Discover Format feature to read in bytes and a timeout for the feature. See Troubleshooting Discovery below. In most cases, defaults should device.

  6. Click Start. The Discover Format feature runs. The left section of the dialog box should display a format--either CSV, JSON, XML, or Binary.

  7. Click Accept. The indicated format should be automatically selected under Format. You can also choose the Line format, which lets you access files line-by-line.

  8. Next, fill in the list of columns and their SQL types. You can use the Clipboard to copy column names and types from another form.

  9. Test the source by clicking the Sample 5 Rows from Source button.

  10. Click the Go Up arrow to exit the Edit Source page.

Troubleshooting Discovery

The Sample Bytes field determines how many bytes Discovery reads before analyzing the input. This number can greatly affect Discovery's performance. If you set it too high and your data is coming in too slowly, you won't see any response from Discovery until it has read these bytes. If you set it too low and it's smaller than the size of a record in your input, Discovery will have difficulty determining your file's format. A good rule of thumb is to set Sample Bytes to about 5X the size of a record in your input, so that Discovery sees multiple records and can make a better guess as to the data types of the columns it finds. For example, if each record is 80 bytes, it would make sense to set Sample Bytes at 4096.

Configuring Avro

If you know or suspect that your streaming data source is Apache Avro, you may need to take additional steps to configure this source.

Before running Discover Format, select Binary for the Format option.

Under Binary Format, select AVRO.

If you know your Avro payload has a schema, check the This Payload Has a Schema String as a Prefix box and enter the location of the schema for the AVRO Schema Location option. AVRO_SCHEMA_FILE has been changed to AVRO_SCHEMA_LOCATION. This option can either be a http URL to fetch the schema or it can be a path to a file on the server host machine or VM.

Note: If you do not select Binary as format, discovery may either recommend "UNKNOWN" or return a CSV with a single column.