This tutorial walks you through using Guavus SQLstream's SQL-generating, browser-based graphical tool, StreamLab.
StreamLab is a web application development environment that automatically generates streaming SQL. StreamLab lets you set up projects where you can parse incoming streaming data, generate real time analytics for it, visualize it, so that you can take action on it. You can gather streaming data from log files, sensors, message buses, network feeds, applications, databases, and other sources. StreamLab does not require any knowledge of SQL programming to use it. But if you do know SQL, you can use StreamLab to generate blocks of SQL, and then modify these in another application.
The tutorial uses a set of data from buses in Sydney, Australia. Each bus sends out a row of data to a server, which continually writes this data to a text file known as a log file. Log files are used by most servers to output activity information for the server.
In this tutorial, you do the following:
Let's begin by connecting to the log file and testing access to it.
To begin working with data in StreamLab, the first thing you need to do is to connect to a streaming data source. The kind of data source will depend on your system's needs, but might include log files, network feeds, Kafka topics, and so on.
In the following exercise, we will capture data from a streaming log file and use this data to identify buses that exceed the speed limit in the Sydney area. Because this data is streaming, speeders will be identified in real time. We will derive this data by connecting to a simulated streaming log file located on your local machine. In this case, the log file records messages sent out by buses in Sydney, Australia.
You first need to start the StreamLab server. If you've installed StreamLab, there should be a folder on your desktop called StreamLab. Open this folder and click the Start StreamLab Server* icon:
Now that the StreamLab server is running, open the SQLstream desktop folder and double-click the Stream Bus Data icon. Once you click the application's icon, an information window opens.
Keep this window open for the remainder of the exercise. As long as this window is open, sample data from buses in the Sydney area is flowing into a file located at /tmp/buses.log at 50 rows per second. This simulates a log file being continually updated. We're going to convert this log file into a stream.
This file features data in the following categories:
|id||DOUBLE||Identification number for the bus.|
|reported_at||TIMESTAMP||Time location was reported.|
|shift_no||DOUBLE||Shift number for the bus's driver.|
|trip_no||VARCHAR(4096)||Trip number for the bus.|
|route_variant_id||VARCHAR(4096)||ID number for bus route.|
|waypoint_id||VARCHAR(4096)||ID number for bus waypoint.|
|lat||VARCHAR(4096)||Latitude of location.|
|lon||VARCHAR(4096)||Longitude of location.|
|speed||DOUBLE||Reported speed of bus.|
|bearing||VARCHAR(4096)||Navigational bearing for bus.|
|driver_no||DOUBLE||Driver identification for number.|
|prescribed||VARCHAR(4096)||The direction on the motorway (into Sydney or out of Sydney).|
|created_at||TIMESTAMP||Timestamp for when this bus started reporting locations.|
|updated_at||TIMESTAMP Timestamp for when this bus.|
You will use StreamLab to capture information from this file. Launch StreamLab by entering localhost:5590 in your browser. We recommend Chrome on Windows, Safari on Mac, and Chromium on Linux.
If you're running in a virtual machine, use the browser in your host OS for the best performance.
Test this configuration using the Sample 5 Rows from Source button. StreamLab reads 5 rows from the source and stops. If no data arrives, check that you are still running the Stream Bus Data script and that you have entered the parameters correctly and try Sample 5 again.
Click Up to exit the source, and up again to exit the Sources page.
With most real world data, you will need to do some formatting before you can work with it. Usually, this comes down to two things: changing the names of columns so that they are meaningful, and changing the data types for the columns.
Like databases, s-Server stores columns with a type. The data type lets s-Server understand how SQL will interact with the column's data. For example, you cannot perform numerical calculations on columns with a text data type.
You can use StreamLab guides to format the data.
Guides are collections of commands, suggestions, and scripts that let you CREATE VIEWs on data sources. These views are composed of SQL. You use the Guide interface to generate this SQL. Begin by adding a new guide to StreamLab.
Guides are stored as collections of views in s-Server. VIEWs are reusable queries on a stream or table--in this case, on the stream called StreamLab_Output.line1.
If you want, you can type in a more meaningful prefix to be used in naming these views. You have now added the log file as a streaming source for the guide.
As you make changes, changes appear in theScript.* As you add items to the Script, the streaming database in s-Server is modified.* You can always undo script changes.
StreamLab suggests changes that you might make to the data. As data comes into StreamLab, a piece of software called the Scrutinizer examines the data, looking for possible commands to apply to the data. For example, it might notice that the file contains regular commas, and suggest splitting the file into columns at every comma.
The table in the middle, the output view, shows the stream resulting from the steps in the script, the output of this Guide.
In this case, StreamLab has noticed several ways it made be split into columns. The first suggestion is to split on commas. Click Add to add it to the script.
Notice that the new columns are all namedline_1*,line_2*,line_3*, and so on. You want to make these columns more readable by assigning them meaningful names. Remember the information window from the buses demo? Let's use it to rename all the columns at once. Before beginning, go to that window and copy the list of columns.
Rowtime StreamLab, and s-Server, depend on a special column called rowtime to deliver accurately timed data. Often, as a developer, you will want to take a column from the existing data with a timestamp and "promote" it to rowtime.
Here, you want to get the timestamp from the _reportedat field. But as is often the case with real-world data, you will need to modify this column before it is recognizable as a timestamp: the extra zeros at the end are preventing it from being recognized as a valid time. To do so, you can extract the timestamp from the column.
In the next step, you will change the data type of the Speed column, so that StreamLab can analyze it as a number. (Remember, you cannot perform numerical calculations on columns with a text data type.)
Note that the speed column is outlined in yellow. This means that StreamLab thinks it is a number.
In this case we only want to cast one column, so we'll do it by hand.
As you have been applying commands, StreamLab has been generating SQL.
SQL SQL stands for Structured Query Language and has been a standard since 1986. SQLstream uses a slight variant of SQL called streaming SQL.
For all projects, you can view the SQL generated by the project, as well as the SQL for individual pages.
To do so, click the Goals and Scripts button in the upper right corner of StreamLab. .
Note that StreamLab has created a series of views that make up the Guide.
At this point, you have formatted your imported data, and you are ready to perform a calculation on it.
Now that you have formatted the imported data, you are ready to perform a calculation on the data. A calculation can be as simple as multiplying the value of a column by two, or pretty much as complex as you need.
Click on the Analytics tab at the top of ScreenLab. Note the group of commands listed:
Analytics let you apply a number of different analytics to the Output view in StreamLab. Some of these produce new columns, and others change selected columns themselves.
|Bucketize||This command lets you create quick "buckets" of data for a particular column, including COUNT, SUM, and AVERAGE.|
|Calculate||This command lets you perform an arithmetical calculation on a column, such as adding 2000 to the column or dividing the column in 2. Selected columns must have numerical values, such as DOUBLE.|
|Categorize||This command lets you take a continuous value, such as a bus speed, and replace it with strings such as "fast", "medium", or "slow" by applying conditions to the column.|
|Running Average||This command lets you create a new column that is a running average of another column over a given window of time.|
|Time Sort||This command uses a sliding time-based window of incoming rows to reorder those rows by the selected column or by ROWTIME in order to address time synchronization problems with data.|
|Window||This command lets you perform a windowed aggregation on a selected column. The command produces a new column with the result of the operation.|
|Partition Window||This command lets you partition rows into groups using the columns in a particular column.|
|GroupRank||This command lets you group rows by the values in one column (group by) and then rank the rows within that group according to another column (rank by) across a window defined by you.|
Here, we're going to apply a simple calculation to the speedcolumn that identifies buses going faster than 45 kilometers an hour.
To do so, you use an analytic called Categories.
Now that you've formatted and applied a calculation to your data, the next step is to create a dashboard to display your data.
Dashboards A dashboard is a graphical display of data. You create these by assigning them parameters, launching them from StreamLab, and then changing preferences for the dashboard.
Using the speed column that you categorized in the previous section, you can create a dashboard with a map that shows speeding buses as red and non-speeding buses as green.
To do so:
A dashboard appears, with two panels created. The same stream is open in both panels, the top panel shows it as a table while the bottom shows StreamLab's World Events map.
Now that you have accessed the log file through StreamLab, formatted the data's columns, and created a dashboard based on this data, you can share the dashboard with other users with the URL you copied into the clipboard.
You can test the dashboard by opening a new browser and pasting the dashboard address into the address bar.