Tutorial 1: Introduction to StreamLab

This tutorial walks you through using Guavus SQLstream's SQL-generating, browser-based graphical tool, StreamLab.

StreamLab is a web application development environment that automatically generates streaming SQL. StreamLab lets you set up projects where you can parse incoming streaming data, generate real time analytics for it, visualize it, so that you can take action on it. You can gather streaming data from log files, sensors, message buses, network feeds, applications, databases, and other sources. StreamLab does not require any knowledge of SQL programming to use it. But if you do know SQL, you can use StreamLab to generate blocks of SQL, and then modify these in another application.

The tutorial uses a set of data from buses in Sydney, Australia. Each bus sends out a row of data to a server, which continually writes this data to a text file known as a log file. Log files are used by most servers to output activity information for the server.

In this tutorial, you do the following:

Let's begin by connecting to the log file and testing access to it.

Connect to a Log File

To begin working with data in StreamLab, the first thing you need to do is to connect to a streaming data source. The kind of data source will depend on your system's needs, but might include log files, network feeds, Kafka topics, and so on.

Stream Processing Task

In the following exercise, we will capture data from a streaming log file and use this data to identify buses that exceed the speed limit in the Sydney area. Because this data is streaming, speeders will be identified in real time. We will derive this data by connecting to a simulated streaming log file located on your local machine. In this case, the log file records messages sent out by buses in Sydney, Australia.

Starting StreamLab and the Buses Demo

You first need to start the StreamLab server. If you've installed StreamLab, there should be a folder on your desktop called StreamLab. Open this folder and click the Start StreamLab Server* icon:

Now that the StreamLab server is running, open the SQLstream desktop folder and double-click the Stream Bus Data icon.

Once you click the application's icon, an information window opens.

Keep this window open for the remainder of the exercise. As long as this window is open, sample data from buses in the Sydney area is flowing into a file located at /tmp/buses.log at 50 rows per second. This simulates a log file being continually updated. We're going to convert this log file into a stream.

This file features data in the following categories:

Column Type Definition
id DOUBLE Identification number for the bus.
reported_at TIMESTAMP Time location was reported.
shift_no DOUBLE Shift number for the bus's driver.
trip_no VARCHAR(4096) Trip number for the bus.
route_variant_id VARCHAR(4096) ID number for bus route.
waypoint_id VARCHAR(4096) ID number for bus waypoint.
last_known_location_state VARCHAR(4096) Location state.
lat VARCHAR(4096) Latitude of location.
lon VARCHAR(4096) Longitude of location.
speed DOUBLE Reported speed of bus.
bearing VARCHAR(4096) Navigational bearing for bus.
driver_no DOUBLE Driver identification for number.
prescribed VARCHAR(4096) The direction on the motorway (into Sydney or out of Sydney).
highway DOUBLE Highway number
created_at TIMESTAMP Timestamp for when this bus started reporting locations.
updated_at TIMESTAMP Timestamp for when this bus.

You will use StreamLab to capture information from this file. Launch StreamLab by entering localhost:5590 in your browser. We recommend Chrome on Windows, Safari on Mac, and Chromium on Linux.

If you're running in a virtual machine, use the browser in your host OS for the best performance.

Creating a New Project in StreamLab

  1. After StreamLab loads, click Projects in the top menu. The StreamLab menu lets you switch between StreamLab projects and dashboards. It is available in the upper right corner of StreamLab.
    Each installation of StreamLab features a starter project. The Copy Project button opens the Copy Project dialog box. Here, you enter a name for the new project. The name needs to have been previously unused in StreamLab.
  2. Enter "buses" for the name of the project and click Copy to create the new project.
  3. Click Start. A help page opens. Click Show this help page* so that this page won't be shown again.
  4. Dismiss the help page.
  5. Click Sources. StreamLab sources are files (line, CSV, JSON, XML) streams, tables, log files, web feeds, and views. They capture data from sensors, message buses, network feeds, applications, databases, and other sources. You can parse local log files--files reachable from s-Server--through StreamLab.
  6. With your mouse, click and drag Line, then drop it in the dotted box.
  7. Click StreamLabOutput.line1 This source is used to read whole lines from a source, in this case a set of files in a directory.
  8. Enter:

    • Directory: /tmp
    • Filename Pattern: buses.log. The filename patter is a regular expression that can match multiple files. Here you want "buses.log" exactly. In Java regular expressions, a period--"."--means "all characters." As a result, you need to preface the period with a backslash, which tells StreamLab that you mean a period.
  9. Test this configuration using the Sample 5 Rows from Source button. StreamLab reads 5 rows from the source and stops. If no data arrives, check that you are still running the Stream Bus Data script and that you have entered the parameters correctly and try Sample 5 again.

  10. Click Up to exit the source, and up again to exit the Sources page.

Format Data in StreamLab

With most real world data, you will need to do some formatting before you can work with it. Usually, this comes down to two things: changing the names of columns so that they are meaningful, and changing the data types for the columns.

Data Types

Like databases, s-Server stores columns with a type. The data type lets s-Server understand how SQL will interact with the column's data. For example, you cannot perform numerical calculations on columns with a text data type.

Working with StreamLab Guides

You can use StreamLab guides to format the data.

Guides are collections of commands, suggestions, and scripts that let you CREATE VIEWs on data sources. These views are composed of SQL. You use the Guide interface to generate this SQL. Begin by adding a new guide to StreamLab.

  1. Click Guides.
  2. With your mouse, click and drag Guide, then drop it in the dotted box.
  3. Click the new Guide
  4. Click StreamLab_Output.line_1. (This is the source that you created in the previous section.)

Views

Guides are stored as collections of views in s-Server. VIEWs are reusable queries on a stream or table--in this case, on the stream called StreamLab_Output.line1.

If you want, you can type in a more meaningful prefix to be used in naming these views. You have now added the log file as a streaming source for the guide.

Using the Guide Interface

As you make changes, changes appear in theScript.* As you add items to the Script, the streaming database in s-Server is modified.* You can always undo script changes.

StreamLab suggests changes that you might make to the data. As data comes into StreamLab, a piece of software called the Scrutinizer examines the data, looking for possible commands to apply to the data. For example, it might notice that the file contains regular commas, and suggest splitting the file into columns at every comma.

The table in the middle, the output view, shows the stream resulting from the steps in the script, the output of this Guide.


In this case, StreamLab has noticed several ways it made be split into columns. The first suggestion is to split on commas. Click Add to add it to the script.

Notice that the new columns are all namedline_1*,line_2*,line_3*, and so on. You want to make these columns more readable by assigning them meaningful names. Remember the information window from the buses demo? Let's use it to rename all the columns at once. Before beginning, go to that window and copy the list of columns.

  1. Click Rename List
  2. Click list of columns. This parameter is now controlled by which columns are selected in the output view.
  3. Click line_1, scroll to the right end and click _line16.
  4. Shift-Click line_16 to select the range of columns
  5. Right-click on the to list parameter and paste in the list of columns we grabbed.
  6. Click here to finish editing this field.
  7. Click Add.
    Once you have finished here, the data has been added to StreamLab, and you're ready to start making calculations on the data. In the next steps, you will format the data in order to perform analytics on it.

Renaming Last Reported to Rowtime

Rowtime StreamLab, and s-Server, depend on a special column called rowtime to deliver accurately timed data. Often, as a developer, you will want to take a column from the existing data with a timestamp and "promote" it to rowtime.

Here, you want to get the timestamp from the _reportedat field. But as is often the case with real-world data, you will need to modify this column before it is recognizable as a timestamp: the extra zeros at the end are preventing it from being recognized as a valid time. To do so, you can extract the timestamp from the column.

  1. Click on any cell to select that row. This also stops the output view from scrolling. You can use the button in the upper left corner of the table to clear the selection and resume scrolling.
  2. Double-click one of the cells in the _reportedat column.
  3. Select all of the text up to right before the period. Note that the column, start, and length parameters to the extract operation have been filled in the Suggestions window.
  4. Add the suggestion Extract text from reported... to the script.
    The substring column appears in the table. Note that the new substring column is outlined in yellow, indicating that it contains valid time values. By default, though, all columns are set to a text type. You need to tell StreamLab that this column is, in fact, a timestamp. Two suggestions to convert it appear. One also will rename it to ROWTIME. Click Add on that one.
    You are now working with the times at which the bus data was originally recorded, instead of the time it was written to our /tmp/buses.log file.

Casting Speed as Double

In the next step, you will change the data type of the Speed column, so that StreamLab can analyze it as a number. (Remember, you cannot perform numerical calculations on columns with a text data type.)

Note that the speed column is outlined in yellow. This means that StreamLab thinks it is a number.

In this case we only want to cast one column, so we'll do it by hand.

  1. Click Cast.
  2. Scroll to the speed column and select it.
  3. Select the type DOUBLE
  4. Click Add.

    Viewing Generated SQL

    As you have been applying commands, StreamLab has been generating SQL.

SQL SQL stands for Structured Query Language and has been a standard since 1986. SQLstream uses a slight variant of SQL called streaming SQL.

For all projects, you can view the SQL generated by the project, as well as the SQL for individual pages.

To do so, click the Goals and Scripts button in the upper right corner of StreamLab.

.

Note that StreamLab has created a series of views that make up the Guide.

At this point, you have formatted your imported data, and you are ready to perform a calculation on it.

Perform basic calculation on data

Now that you have formatted the imported data, you are ready to perform a calculation on the data. A calculation can be as simple as multiplying the value of a column by two, or pretty much as complex as you need.

Using StreamLab Analytics

Click on the Analytics tab at the top of ScreenLab. Note the group of commands listed:


Analytics let you apply a number of different analytics to the Output view in StreamLab. Some of these produce new columns, and others change selected columns themselves.

Command Description
Bucketize This command lets you create quick "buckets" of data for a particular column, including COUNT, SUM, and AVERAGE.
Calculate This command lets you perform an arithmetical calculation on a column, such as adding 2000 to the column or dividing the column in 2. Selected columns must have numerical values, such as DOUBLE.
Categorize This command lets you take a continuous value, such as a bus speed, and replace it with strings such as "fast", "medium", or "slow" by applying conditions to the column.
Running Average This command lets you create a new column that is a running average of another column over a given window of time.
Time Sort This command uses a sliding time-based window of incoming rows to reorder those rows by the selected column or by ROWTIME in order to address time synchronization problems with data.
Window This command lets you perform a windowed aggregation on a selected column. The command produces a new column with the result of the operation.
Partition Window This command lets you partition rows into groups using the columns in a particular column.
GroupRank This command lets you group rows by the values in one column (group by) and then rank the rows within that group according to another column (rank by) across a window defined by you.

Here, we're going to apply a simple calculation to the speedcolumn that identifies buses going faster than 45 kilometers an hour.

To do so, you use an analytic called Categories.

  1. Click the Categorize tab.
  2. Now, scroll right and select the speed column
  3. Enter normal for the default category and speeding for the next category. With the speeding category, select is greater than or equal to, then enter 45. This tells StreamLab "divide the column speeders into two categories: normal for everything not greater than 45, and speeding for everything greater than 45."
  4. After entering information for the category, click the + button in the Suggestions window to add the category to the Script.
    When it refreshes, scroll right to see that the speed column now lists "speeding" and "normal" instead of numerical speeds. You are now ready to display the formatted and analyzed data in a dashboard.

Launch a Dashboard

Now that you've formatted and applied a calculation to your data, the next step is to create a dashboard to display your data.

Dashboards A dashboard is a graphical display of data. You create these by assigning them parameters, launching them from StreamLab, and then changing preferences for the dashboard.

Using the speed column that you categorized in the previous section, you can create a dashboard with a map that shows speeding buses as red and non-speeding buses as green.

To do so:

  1. Click Custom Dashboards
  2. Click Map and Table This dashboard requires five columns: the latitude and longitude for each event, a value to be shown at that location, a key used to categorize that value, and a unique ID identify the objects being tracked.
  3. You can leavevalue* at its default. Select the lat* column for latitude and the loncolumn for longitude. For the key, select speed*, and forid*, select id*.*
  4. After entering information for the dashboard, click the + button in the Suggestions window to add the category to the Script.
  5. Click the
    icon in the Script to launch the dashboard. (You can always launch dashboards from the Script for any particular step in the Script.)

Customizing the Dashboard

A dashboard appears, with two panels created. The same stream is open in both panels, the top panel shows it as a table while the bottom shows StreamLab's World Events map.

  1. Click the SQLstream logo in the upper right corner of the map panel.
  2. In the menu that opens, choose Select Panel.
  3. Select Pan & Zoom Map and click Select.
    Now, you can change preferences for the panel itself.
  4. In the panel that opens, open the panel menu again and choose Panel Preferences*.
  5. Enter 2000 for Maximum Events and unselect Cluster events
  6. Set the Event Label to: <%= id %>
  7. Use the plus button to add two more entries in the Key Color Map
  8. Enter speeding and set the circle color to red.
  9. Leave the second entry blank and set the color circle to green.
  10. Click Update.
  11. Save the dashboard. The dashboard can now be opened even if StreamLab is not running.
  12. Click the Copy Dashboard to Clipboard button.

Launch the Dashboard in a Browser

Now that you have accessed the log file through StreamLab, formatted the data's columns, and created a dashboard based on this data, you can share the dashboard with other users with the URL you copied into the clipboard.

You can test the dashboard by opening a new browser and pasting the dashboard address into the address bar.


Move to Tutorial 2