Working with Projects

A StreamLab project is a set of StreamLab sources, external connections, sinks, and pipeline guides. You can create projects for different use cases, such as monitoring log on activity to a web site, tracking vehicle speeds across a bus system, or measuring HTTP requests on a server.

Projects Overview

Projects consist of the following:

  • Sources are log files, Kafka topics, JSON files, web feeds, external database tables, s-Server streams, tables, and so on. They capture data from web feeds, sensors, message buses, network feeds, applications, databases, and other sources. You can parse local log files -- files reachable from s-Server -- through StreamLab.
  • External Connections are databases or other data sources external to SQLstream s-Server. Once you set up an external connection, you can read and write to such data sources from StreamLab using a sink.
  • Sinks are destinations for rows of data, usually an external file system, message bus, or database. In s-Server, a sink consists of a stream and a pump to fill it (a pump moves data from one location to another. Internally, StreamLab uses sinks to connect pipeline guides with each other.
  • Pipeline Guides are collections of commands, suggestions, and scripts that you use to generate SQL views on your data sources. You can view and export the SQL generated by pipeline guides. See the topic StreamLab Pipeline Guides Overview in this guide for more details.

Projects can named, saved, and reopened. They have unique URLs, which you can share with others. Project names and user names will be appended to the StreamLab URL, as in the following:

http://myserver.com:5590/?proj=MyProject&user=user

Generally, you will want to have multiple StreamLab projects to manage different aspects of your data. You may want to start with a single project and save a copy of it when you are satisfied with it, building up a set of StreamLab applications to examine different configurations of data.

StreamLab projects are listed on the StreamLab projects home page.

Using Project Settings

StreamLab projects have settings that let you change the project's name and schema, and manage how the project handles streaming data.

You can change the project's name, schema name, and adjust other settings through the Project Settings dialog. You can access this dialog box by clicking the Settings icon in the top left corner of the StreamLab page:

StreamLab Menu

The Project Settings dialog lets you change the project's name, project schema name, manage how StreamLab handles queries on streams that are in use, handles throttling, and so on.

A schema is where project elements are "stored" in s-Server. By default, all the objects you create--pipeline guides, sinks, external connections, sources--are stored in the Project schema. The schema name is particularly important for developers who are accessing content that you create in StreamLab through s-Server.

Managing How StreamLab Handles Currently-Queried Streams

When a stream is being queried, it's not possible to change the stream with a SQL script (that is, StreamLab cannot submit a CREATE OR REPLACE STREAM script). By default, StreamLab asks you if it's okay to terminate these queries, but you can also choose to terminate these queries without asking or never terminate these queries.

  • Stop Queries Without Asking. StreamLab automatically terminates queries when you submit SQL for a currently-queried stream.
  • Ignore, Allowing SQL Scripts to Fail. StreamLab can automatically continue running queries and allow the submitted SQL to fail.
  • Ask for Permission to Stop Queries. This is the default behavior. With this setting enabled, when you submit SQL for a currently-queried stream, StreamLab will ask list currently-queried streams and ask your permission to terminate the query or queries:

    sl_streams_in_use_warning

When StreamLab terminates these queries, users viewing the dashboard using the query will see incoming data stop flowing. These users should should just save their changes to the dashboard and refresh the page.

Manage the SQL Run by StreamLab

By default, StreamLab runs the entire SQL script when you open it. You can deselect the Run the Complete SQL Script When Opened option to avoid running the entire script.

In order to use sources, StreamLab renders them in SQL. If you are not using sources, you can choose to leave them unrendered, which may improve performance in some cases. The Unattached Sources option lets you choose to leave these sources unrendered.

Managing Throttling

Sometimes, you may want to slow a data feed for testing purposes. In these cases, you can throttle your source--slow it to a specified number of rows per second. The default throttled rate is one row per second, but you can adjust this default rate by entering a different number in Project Standard Throttle Rate. You can also disable throttling for the project. You would most likely want to do so once you are ready to deploy a stream app. See throttling sources for more details.

Managing How the Scrutinizer Identifies Partition Keys

The Project Settings dialog box also lets you adjust settings related to the Scrutinizer. See the topic Managing the Scrutinizer for more details.

You can adjust what columns are identified as partition keys by changing Partition Key Unique Limit and Partition Key Length Limit.

  • Partition Key Unique Limit determines how limited a column needs to be--how many different values--in order to be identified as a partition key. If the Scrutinizer sees more than Partition Key Unique Limit it assumes the column probably isn't a partition key.
  • Partition Key Length Limit determines how many values can be in a Partition Key. If the Scrutinizer sees a string value longer than Partition Key Unique Limit, it assumes the column probably isn't a partition key. You can adjust these two values to shape which columns end up being marked.

Using the StreamLab Projects Home Page

You manage projects through the Projects Home Page. This page lists projects that you have created, as well as prebuilt projects in the StreamLab Gallery below. You can use this page to open projects, start and stop their streams, delete projects, and import/export projects

Note: If you are running StreamLab on a port other than 5590, substitute that port for 5590 in the sentence above.

Pausing and Starting Data Flowing in Schemas

All sources, pipeline guides, sinks, and external connections exist in schemas. You can start or pause data flowing in these schemas from the Projects home page.

Saving Unsaved Projects from the Projects Home Page

Projects that have not been saved will feature a "current project, not saved!" alert message, and an icon that lets you save the project.

If you try to open another project without saving the current project, StreamLab will alert you and ask if you want to save the current project before proceeding.

StreamLab contains several built-in StreamApps that you can use as the basis for projects. The StreamApp Gallery is a collection of StreamApps based on real-world data. These both provide demonstrations of StreamLab's functionality, and can also be used as starting templates for your projects.

Before working with any of them, you will need to start the associated data stream. If you are running Guavus SQLstream on Amazon Marketplace, Microsoft Azure, as a Virtual Machine Appliance, as a Docker container, or Virtual Hard Disk, you can start these from the Guavus SQLstream cover page. If you have installed on Linux, you will need to start these manually.

Sydney Buses

This app processes real-time file system based telemetry data, such as latitude and longitude, driver id, bearing, speed, and so on from buses in the Sydney metropolitan area to create dashboards that show traffic patterns in terms of bus locations, speeds, and so on. This application uses the file system as a source.

Starting the Buses Sample Streaming Data Source

For appliances and Docker installations, on the cover page, scroll down to Sydney Buses is Running and click the On/Off switch to start streaming data. Data streams to /tmp/buses.log.

If you are running s-Server on a local machine, you can start the script by opening a terminal and entering the following:

$SQLSTREAM_HOME/demo/data/buses/start.sh

To stop the script, enter

$SQLSTREAM_HOME/demo/data/buses/stop.sh

Data streams to /tmp/buses.log.

This file features data in the following categories:

Column Type Definition
id DOUBLE Identification number for the bus.
reported_at TIMESTAMP Time location was reported.
shift_no DOUBLE Shift number for the bus's driver.
driver_no DOUBLE Driver identification for number.
prescribed VARCHAR(4096) The direction on the motorway (into Sydney or out of Sydney).
highway DOUBLE Highway number if available.
gps VARCHAR GPS information with latitude, longitude, and bearing in JSON format.

CDN QoS

This app uses Kafka as a source, and processes telemetry from video players around the world. It visualizes this data on a map. (Note: In order to use this demo on Linux, you need to have both Kafka and Kafkacat installed.)

CDN App in StreamLab:

CDN demo in StreamLab

Running the CDN Demonstration Script

This demonstration script streams simulated telemetry data from video players around the world into a Kafka topic named cdn. If you have installed SQLstream in a Docker container or appliance, you can start this demonstration script on the Guavus SQLstream cover page.

To run the cdn demonstration data on a Linux install, you need to have installed:

  1. A Kafka broker. s-Server assumes that the broker is on the local server (at localhost:9092)
  2. The Kafkacat utility.

To install a Kafka broker, you can download it from here: https://kafka.apache.org/downloads

and follow the quickstart instructions here https://kafka.apache.org/quickstart to start zookeeper and Kafka.

The IoT demonstration script assumes that Kafka is installed at /opt/Kafka.

To install the kafkacat utility. run one of the following commands:

For Ubuntu:

sudo apt-get install -y kafkacat

For Centos:

sudo yum install kafkacat

Once you have Kafka and Kafkacat installed, you can run the cdn demonstration data as follows:

Command Result
$SQLSTREAM_HOME/demo/cdn/start.sh Streams data to a Kafka topic named cdn.
$SQLSTREAM_HOME/demo/cdn/stop.sh Stops streaming data to the cdn topic.
$SQLSTREAM_HOME/demo/cdn/status.sh Checks the status of the script.

To start the demo, run

$SQLSTREAM_HOME/demo/cdn/start.sh

The script will return something along the following lines:

starteddrew@drew-VirtualBox:/$

Keep this terminal window open for as long as you want data to stream.

To confirm that data is streaming, open a new terminal and run the following (assuming that you have installed Kafka at /opt/Kafka)

/opt/Kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic cdn --from-beginning

IoT Weather

Process real-time environmental sensor data from around the world to create a visualization of the weather. This application uses Kafka messages as a source. (Note: In order to use this demo on Linux, you need to have both Kafka and Kafkacat installed.)

IoT App in StreamLab:

IoT application in StreamLab

Running the IoT Demonstration Script

This demonstration script streams simulated environmental sensor data from around the world data into a Kafka topic named IoT. If you have installed SQLstream in a Docker container or appliance, you can start this demonstration script on the Guavus SQLstream cover page.

To run the IoT demonstration data on a Linux install, you need to have installed:

  1. A Kafka broker. s-Server assumes that the broker is on the local server (at localhost:9092)
  2. The Kafkacat utility.

To install a Kafka broker, you can download it from here: https://kafka.apache.org/downloads

and follow the quickstart instructions here https://kafka.apache.org/quickstart to start zookeeper and Kafka.

The IoT demonstration script assumes that Kafka is installed at /opt/Kafka.

To install the kafkacat utility. run one of the following commands:

For Ubuntu:

sudo apt-get install -y kafkacat

For Centos:

sudo yum install kafkacat

Once you have Kafka and Kafkacat installed, you can run the IoT demonstration data as follows:

Command Result
$SQLSTREAM_HOME/demo/IoT/start.sh Streams data to a Kafka topic named IoT.
$SQLSTREAM_HOME/demo/IoT/stop.sh Stops streaming data to the IoT topic.
$SQLSTREAM_HOME/demo/IoT/status.sh Checks the status of the script.

To start the demo, run

$SQLSTREAM_HOME/demo/IoT/start.sh

The script will return something along the following lines:

starteddrew@drew-VirtualBox:/$

Keep this terminal window open for as long as you want data to stream.

To confirm that data is streaming, open a new terminal and run the following (assuming that you have installed Kafka at /opt/Kafka)

/opt/Kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic IoT --from-beginning

Creating a Project

To create a new project, click Projects in the top left corner of the StreamLab page. In the page that opens, click the Create New Project Button.

Each installation of StreamLab features also features StreamApps that you can use as the basis for new projects, such as the StreamLab_Bus_Demo pictured above. You can use the Copy Project button for a StreamApp or for a project that you previously created. The Copy Project button opens the Copy Project dialog box. Here, you enter a name for the new project. The name needs to have been previously unused in StreamLab.

In either case, a dialog box opens that lets you name the new project:

Saving Projects

To save a project, you click its title at in the top middle of the StreamLab application. Once you make any changes to the project, such as adding an item to the Script, this name turns blue. Click the title to save the project. The title returns to white after you click it.

Importing and Exporting Projects from the Projects Home Page

Exporting Projects

You can export projects from the Projects Home Page using the Export Project as Package button. We recommend exporting all projects any time you upgrade StreamLab.

To do so:

  1. Click the Export Project as Package button.
  2. In the window that opens, click the link to download the .slab file.
    The file downloads into the default download location for your current browser.

This file contains all the information you need to import the project into another StreamLab installation.

Importing Projects

You can import these saved files into StreamLab.

To do so:

  1. Click the Import Project from a Package File Button on the Projects Home Page.
    A window opens that allows you to drag a .slab file into it.
  2. Drag a .slab file into this window. Once you do, a dialog box opens that lets you configure the new project:
  3. If desired, edit the following:

    • Project Name. This name appears on the Projects Home Page.
    • Project Title. This is the title that appears at the top of the StreamLab page.
    • Schema Name. StreamLab projects are located in schemas, which are groups of objects in s-Server. See Project Schemas below for more details.
    • You can also choose to write-protect any dashboards that are part of the imported project. (You can unprotect these later.)
  4. Click Import to import the project.

The project appears on the Projects Home Page.

Project Schemas

Schemas are groups of objects in s-Server, including streams, tables, and so on. All StreamLab objects--pipeline guides, external connections, sinks, sources--represent one or more of these s-Server object. A sink, for example, consists of a pump and a stream.

When you create a new source, external connection, sink, or pipeline guide, StreamLab uses the project schema by default. Using the project schema makes it easier for other s-Server developers to access the objects you create in StreamLab.

You can pause all streams in a schema by using the Start Streaming button on the Projects page. See Using the StreamLab Projects Home Page for more details.