Tutorial 1: Introduction to StreamLab

This tutorial walks you through using Guavus SQLstream's SQL-generating, browser-based graphical tool, StreamLab.

StreamLab is a web application development environment that automatically generates streaming SQL. StreamLab lets you set up projects where you can parse incoming streaming data, generate real time analytics for it, visualize it, so that you can take action on it. You can gather streaming data from log files, sensors, message buses, network feeds, applications, databases, and other sources. StreamLab does not require any knowledge of SQL programming to use it. But if you do know SQL, you can use StreamLab to generate blocks of SQL, and then modify these in another application.

The tutorial uses a set of data from buses in Sydney, Australia. Each bus sends out a row of data to a server, which continually writes this data to a text file known as a log file. Log files are used by most servers to output activity information for the server.

In this tutorial, you do the following:

  1. Use StreamLab to access a log file and integrate its data.
  2. Format this data for processing.
  3. Perform a simple analytic on this data.
  4. Create a visual representation of this data on a map using a dashboard.
  5. Share the dashboard with other users.

Let's begin by connecting to the log file and testing access to it.

Connect to a Log File

To begin working with data in StreamLab, the first thing you need to do is to connect to a streaming data source. The kind of data source will depend on your system's needs, but might include log files, network feeds, Kafka topics, and so on.

Stream Processing Task

In the following exercise, we will capture data from a streaming log file and use this data to identify buses that exceed the speed limit in the Sydney area. Because this data is streaming, speeders will be identified in real time. We will derive this data by connecting to a simulated streaming log file located on your local machine. In this case, the log file records messages sent out by buses in Sydney, Australia.

To begin, you need to start StreamLab and webAgent. If you are running Guavus SQLstream in a Docker container or other virtual machine, you can do so from the Guavus SQLstream Cover Page. If you are running Guavus SQLstream on Linux on a regular install, you can start both StreamLab and the streaming data source manually.

  1. Go to http://localhost and click the On/Off switch to the right of webAgent is not Running to start webAgent.
  2. Next, click the On/Off switch to the right of StreamLab is not Running to start webAgent.
  3. To start the sample streaming data source. On the cover page, scroll down to Sydney Buses is Running and click the On/Off switch to start streaming data. Data streams to /tmp/buses.log.

This file features data in the following categories:

Column Type Definition
id DOUBLE Identification number for the bus.
reported_at TIMESTAMP Time location was reported.
shift_no DOUBLE Shift number for the bus's driver.
driver_no DOUBLE Driver identification for number.
prescribed VARCHAR(4096) The direction on the motorway (into Sydney or out of Sydney).
highway DOUBLE Highway number
gps VARCHAR GPS information with latitude, longitude, and bearing in JSON format.

You will use StreamLab to capture information from this file.

Launch StreamLab by entering localhost:5590 in your browser. We recommend Chrome on Windows, Safari on Mac, and Chromium on Linux.

If you're running in a virtual machine, use the browser in your host OS for the best performance.

Creating a New Project in StreamLab

  1. After StreamLab loads, click Projects in the top menu. The StreamLab menu lets you switch between StreamLab projects and dashboards. It is available in the upper right corner of StreamLab.
  2. Enter "buses" for the name of the project and click Create to create the new project.
  3. Click Start. A help page opens. Click Show this help page so that this page won't be shown again.
  4. Dismiss the help page.
  5. Click Sources. StreamLab sources are external streams (files, Kafka topics, network sockets, AMQP messsage buses, and so on), external tables (RDBMS sources), s-Server streams, and public data. They capture data from sensors, message buses, network feeds, applications, databases, and other sources. You can parse local log files--files reachable from s-Server--through StreamLab. See StreamLab sources for more details.
  6. With your mouse, click and drag External Stream to the right to drop it in the dotted box.
  7. Click the source you just added to configure it.
  8. Enter:

    • Directory: /tmp
    • Filename Pattern: buses.log. The filename pattern is a regular expression that can match multiple files. Here you want "buses.log" exactly. In Java regular expressions, a period--"."--means "all characters." As a result, you need to preface the period with a backslash, which tells StreamLab that you mean a period.
  9. Click Discover Format and click Start in the window that opens. Discover Format should return XML as format.

  10. Click Accept Format. StreamLab automatically configures the source:

  11. Test this configuration using the Sample 5 Rows from Source button. StreamLab reads 5 rows from the source and stops. If no data arrives, check that you are still running the Stream Bus Data script and that you have entered the parameters correctly and try Sample 5 again.

  12. Click the Up Arrow to exit the source, and up again to exit the Sources page.

Format Data in StreamLab

With most real world data, you will need to do some formatting before you can work with it. Usually, this comes down to two things: changing the names of columns so that they are meaningful, and changing the data types for the columns.

Data Types

Like databases, s-Server stores columns with a type. The data type lets s-Server understand how SQL will interact with the column's data. For example, you cannot perform numerical calculations on columns with a text data type.

Working with StreamLab Pipeline Guides

You can use StreamLab Pipeline Guides to format the data.

Pipeline Guides are collections of commands, suggestions, and scripts that let you CREATE VIEWs on data sources. These views are composed of SQL. You use the Guide interface to generate this SQL. Begin by adding a new guide to StreamLab.

  1. Click Pipeline Guides.
  2. With your mouse, click and drag Pipeline, then drop it in the dotted box.
  3. Click the new Pipeline to open it.
  4. Under Select One of the Sources as Input, click StreamLab_Output_buses.data_1. (This is the source that you created in the previous section.)

Views

Guides are stored as collections of views in s-Server. VIEWs are reusable queries on a stream or table--in this case, on the stream called StreamLab_Output.line1.

If you want, you can type in a more meaningful prefix to be used in naming these views. You have now added the log file as a streaming source for the guide.

Using the Guide Interface

As you make changes, changes appear in the Script. As you add items to the Script, the streaming database in s-Server is modified.* You can always undo script changes.

StreamLab suggests changes that you might make to the data. As data comes into StreamLab, a piece of software called the Scrutinizer examines the data, looking for possible commands to apply to the data. For example, it might notice that the file contains regular commas, and suggest splitting the file into columns at every comma.

The table in the middle, the output view, shows the stream resulting from the steps in the script, the output of this Guide.

Guide Overview

Promoting Last Reported to Rowtime

StreamLab, and s-Server, depend on a special column called rowtime to deliver accurately timed data. Often, as a developer, you will want to take a column from the existing data with a timestamp and "promote" it to rowtime.

Here, you want to get the timestamp from the _reportedat field.

To do so:

  1. Click the reported_at column to select it.
  2. Under Suggestions, you should see Convert reported_at to a timestamp using yyy-MM-dd'T':mm:ss.SSS and make it the new ROWTIME column. Click the + to the right of this Suggestion to add it to the script.

Note that the reported_at column has disappeared and that the ROWTIME column now lists timestamps beginning with 2014.

You are now working with the times at which the bus data was originally recorded, instead of the time it was written to the /tmp/buses.log file.

Parsing JSON in the GPS Column

The gps column is currently in unparsed JSON. You can parse the JSON using the Parse JSON command.

To do so:

  1. Click the gps column to select it.
  2. Under Suggestions, you should see Parse the JSON in gps extracting the properties lat,lon,bearing into columns with the prefix gps.
  3. Click gps. In the window that opens, delete the gps text and click the check mark.
  4. Click the + to the right of this Suggestion to add it to the script.

The Output View should now display lat, lon, and bearing as distinct columns.

Casting Lat and Lon as Doubles

In the next step, you will change the data type of the Speed column, so that StreamLab can analyze it as a number. (Remember, you cannot perform numerical calculations on columns with a text data type.)

Note that the lat and lon columns are outlined in green. This means that StreamLab thinks they are numbers.

In this case we only want to cast one column, so we'll do it by hand.

  1. Click Cast.
  2. Select the lat and lon columns.
  3. Select the type DOUBLE.
  4. Click Add.

Viewing Generated SQL

As you have been applying commands, StreamLab has been generating SQL.

SQL stands for Structured Query Language and has been a standard since 1986. SQLstream uses a slight variant of SQL called streaming SQL.

For all projects, you can view the SQL generated by the project, as well as the SQL for individual pages.

To do so, click the Goals and Scripts button in the upper right corner of StreamLab.

. Note that StreamLab has created a series of views that make up the Guide.
At this point, you have formatted your imported data, and you are ready to perform a calculation on it.

Perform A Basic Calculation on Data

Now that you have formatted the imported data, you are ready to perform a calculation on the data. A calculation can be as simple as multiplying the value of a column by two, or pretty much as complex as you need.

Using StreamLab Analytics

Click on the Analytics tab at the top of ScreenLab. Note the group of commands listed:

Analytics let you apply a number of different analytics to the Output view in StreamLab. Some of these produce new columns, and others change selected columns themselves.

Analytic Description
Aggregate This command lets you aggregate a column, including COUNT, SUM, and AVERAGE.
Bollinger Bands This command lets you create a new column that is a running average of another column over a given window of time. You can also add Bollinger bands to this running average. Bollinger Bands are a volatility indicator, typically used for stock prices.
Calculate This command lets you perform an arithmetical calculation on a column, such as adding 2000 to the column or dividing the column in 2. Selected columns must have numerical values, such as DOUBLE.
Categorize This command lets you take a continuous value, such as a bus speed, and replace it with strings such as "fast", "medium", or "slow" by applying conditions to the column.
Distinct The Distinct operation lets you eliminate rows with duplicate columns over a specified timeframe.
Geo-Locate IP This command lets you determine the latitude and longitude associated with IP addresses worldwide.
GroupRank This command lets you group rows by the values in one column (group by) and then rank the rows within that group according to another column (rank by) across a window defined by you.
Join Join lets you combine data in the current pipeline guide with data from another stream or table. They allow you to enrich data in the current view with data from another source, as long as that source shares a column with the source for the current guide.
Time Sort This command uses a sliding time-based window of incoming rows to reorder those rows by the selected column or by ROWTIME in order to address time synchronization problems with data.
New Column The New Column analytic lets you add a new column based on a custom SQL expression.

Here, we're going to apply a simple calculation to the speed column that identifies buses going faster than 45 kilometers an hour.

To do so, you use an analytic called Categorize.

  1. Click the Categorize tab.
  2. Now, scroll right and select the speed column
  3. Enter normal for the default category and speeding for the next category. With the speeding category, select is greater than or equal to, then enter 45. This tells StreamLab "divide the column speeders into two categories: normal for everything not greater than 45, and speeding for everything greater than 45."
  4. After entering information for the category, click the + button in the Suggestions window to add the category to the Script.
    When it refreshes, scroll right to see that the speed column now lists "speeding" and "normal" instead of numerical speeds. You are now ready to display the formatted and analyzed data in a dashboard.

Launch a Dashboard

Now that you've formatted and applied a calculation to your data, the next step is to create a dashboard to display your data.

A dashboard is a graphical display of data. You create these by assigning them parameters, launching them from StreamLab, and then changing preferences for the dashboard.

Using the speed column that you categorized in the previous section, you can create a dashboard with a map that shows speeding buses as red and non-speeding buses as green.

To do so:

  1. Click Custom Dashboards
  2. Click Map and Table This dashboard requires three columns: the latitude and longitude for each event and a value to be shown at that location.

  3. You can leave map type at its default. Select the lat column for latitude and the lon column for longitude.

  4. After entering information for the dashboard, click the + button in the Suggestions window to add the category to the Script.

  5. Click the View Dashboard icon in the Script (it looks like an eye) to launch the dashboard. (You can always launch dashboards from the Script for any particular step in the Script.)

Customizing the Dashboard

A dashboard appears, with two panels created. The same stream is open in both panels, the top panel shows it as a table while the bottom shows StreamLab's World Events map.

  1. Click the SQLstream logo in the upper right corner of the map panel.
  2. In the menu that opens, choose Select Panel.
  3. Select Geographical and 3-D Streams > Pan & Zoom Map and click Select.
    Now, you can change preferences for the panel itself.
  4. Click the SQLstream logo in the upper right corner of the map panel and choose Input Preferences.
  5. Enter speed for Key Column.
  6. Click the SQLstream logo in the upper right corner of the map panel and choose Panel Preferences.
  7. Enter 20 for Diameter Formula and unselect Cluster Events.
  8. Set the Event Label to: <%= id %>
  9. Use the plus button to add two more entries in the Key Color Map.
  10. Enter speeding and set the circle color to red.
  11. Leave the second entry blank and set the color circle to green.
  12. Click Update.
  13. To view markers, zoom into the east coast of Australia, and then the Sydney area.
  14. Click the SQLstream logo in the upper right corner of the map panel and choose Save Dashboard to save the dashboard. The dashboard can now be opened even if StreamLab is not running.
  15. Click the Copy Dashboard to Clipboard button. In the window that opens, copy the link.

Launch the Dashboard in a Browser

Now that you have accessed the log file through StreamLab, formatted the data's columns, and created a dashboard based on this data, you can share the dashboard with other users with the URL you copied into the clipboard.

You can test the dashboard by opening a new browser and pasting the dashboard address into the address bar.

Starting StreamLab and the Buses Demo without the Guavus SQLstream Cover Page

You first need to start the StreamLab server. If you've installed StreamLab with a regular installation, there should be a folder on your desktop called StreamLab. Open this folder and click the Start StreamLab Server icon:

Otherwise, enter

/opt/sqlstream/<VERSION>/StreamLab/streamlab.sh

to launch StreamLab.

Now that the StreamLab server is running, open a terminal and enter the following:

/opt/sqlstream/<VERSION>/s-Server/demo/data/buses/StreamXmlBusData.sh

Once you hit enter, an information window opens.

Keep this window open for the remainder of the exercise. As long as this window is open, sample data from buses in the Sydney area is flowing into a file located at /tmp/buses.log at 50 rows per second. This simulates a log file being continually updated. We're going to convert this log file into a stream.

Return to Use StreamLab to access a log file and integrate its data

Move to Tutorial 2