Tutorial 2: Introduction to Coding with Streaming SQL

In the last tutorial, we used StreamLab to connect with a data source, perform a simple analytic on the data, and visualize the data. As it worked, StreamLab was generating SQL. SQL is a query language that has been used for decades to query databases. SQLstream uses a version of SQL extended for streaming data. The Streaming SQL Reference guide provides a full definition of this SQL.

Common tasks in SQL include creating streams and tables, inputting data into these tables, querying these streams and tables, moving data from between objects, and performing analysis on this data. This tutorial instructs you how to complete these tasks in SQL, first by using SQLstream’s Integrated Development Environment, s-Studio, and second by writing SQL by hand and executing it using sqlline, a console utility for executing SQL.

Tutorial 2A: Creating a Stream and View in SQLstream s-Studio

Before you can begin working with data in s-Studio, you first need to connect to s-Server.

If you have installed s-Server and s-Studio as part of the same package, a connection to s-Server has been already set up.

To connect to this instance, right-click on First SQLstream Server and choose Connect.

Create a Foreign Stream to Connect with a Log File

A data source is a server object. Server objects are objects in s-Server that contain all the information needed to connect to a data source, such as a log file on a server. s-Server uses server objects to configure connections to data sources, such as external database tables, log files, and Kafka topics.

To create the data source:

  1. If it is not expanded already, click First SQLstream Server to expand it.
  2. Right-click Data Sources & Sinks and select New.
  3. In the dialog box that opens, enter Buses_Server for the data source’s name.
    A tab opens called Buses_Server (Data Source).
  4. To the right of the Options dialog box, click the Add button. One by one, enter the following properties for the server, without quotation marks. These need to be entered exactly in order to work.
    • format_type: CSV
    • character_encoding: UTF-8
    • separator: \,
    • skip_header: false
    • directory: /tmp
    • filename_pattern: buses.log

Once you define a server object, you can use it for multiple streams. This is one of the advantages of storing connection information in a separate SQL object.

Note that the Buses_Server (Data Source) tab has an asterisk next to it. This indicates that the data source has not been saved.

  1. Choose File > Save (or Ctrl-S) to save the server. A message, “Changes to this data server will not affect already-running streams…” appears.
  2. Dismiss this message by clicking OK. Observe that Buses_Server now appears under Data Sources & Sinks in the Development tree. The data source is now available to be associated with a stream.

Create a Foreign Stream

Next, you create a foreign stream that will store the data from the log file. All streams are created in schemas. To create a new schema:

  1. Right-click Schemas and select New.
  2. In the dialog box that opens, enter Buses_Schema* for the data source’s name.
  3. Click on Buses_Schema to expand it.
  4. Right-click Foreign Streams and select New.
  5. In the dialog box that opens, enter Buses_Foreign_Stream* for the data source’s name. A tab opens called Buses_Foreign_Stream (Foreign Stream).
  6. To the right of Server, select Buses_Server* from the dropdown menu. Note that the Buses_Foreign_Stream (Foreign Stream) tab has an asterisk next to it. This indicates that the data source has not been saved.
  7. Choose File > Save (or Ctrl-S) to save the foreign stream.

Creating a VIEW on the Foreign Stream

Now that we’ve created a foreign stream, and promoted a column to ROWTIME, we want to perform a basic calculation on this stream. We’ll use a VIEW to do this as well.

We’ll incorporate a WHERE clause into the new view.

The WHERE clause extracts records that meet a specified condition. The condition can be a numeric or string comparison, as in the following examples

SELECT STREAM FROM Sales WHERE Employee='Bob';
SELECT STREAM FROM Sales WHERE Customer_Type=1;
SELECT STREAM FROM Sales WHERE Customer_Type<1;

You can use the following operators in a WHERE clause. For more information on operators and the WHERE clause, see SQLstream Streaming SQL Reference Guide.

Operator Description
= Equality
!= Inequality
<> Inequality
> Greater than
>= Greater than or equal to (not less than)
< Less than
<= Less than or equal to (not more than)
NOT Logical negation
AND Conjunction
OR Disjunction
IS Logical assertion
IS NOT UNKNOWN Negated unknown comparison:
IS NOT UNKNOWN
IS NULL Null comparison:
IS NULL
IS NOT NULL Negated null comparison:
IS NOT NULL
= Equality
!= Inequality
<> Inequality
> Greater than
>= Greater than or equal to (not less than)
< Less than
<= Less than or equal to (not more than)
BETWEEN Range comparison: col1 BETWEEN expr1 AND expr2
IS DISTINCT FROM Distinction
IS NOT DISTINCT FROM Negated distinction

Here, we’ll apply the WHERE clause to the “speeders” column, in order to limit results to those buses going faster than 75 km/hour.

Enter the following into SQLline:

CREATE or REPLACE VIEW "buses"."speeders" AS SELECT STREAM FROM "buses"."buses_with_rowtime" WHERE "speed" > 75;

This creates a VIEW with all the columns of “buses_with_rowtime” in which the column “speed” has a value greater than 75.

Next, we’ll create a more specific VIEW tailored to the needs of a map dashboard. This VIEW applies the same WHERE statement, but selects only “lat” “lon” and “speed”. We’ll use this VIEW in the next step of the tutorial to place the location of speeding buses on a map.

CREATE or REPLACE VIEW "buses"."speeders_map" AS SELECT STREAM "lat", "lon", "speed" FROM "buses"."buses_with_rowtime" WHERE "speed" > 75;

Write a SELECT on the VIEW and show data flowing

You can query any objects that contain data directly from s-Studio. Here, we will run a query on the view we just created.

  1. Right-click the speeders view and choose New Query.
  2. In the window that opens, enter the following text: sql SELECT STREAM * from "speeders"
  3. Click the green arrow at the top of the tab to execute the query. Observe that data starts flowing in the bottom of the tab. If you scroll over to the speeders* column, you will see that only rows with speeds above 75 appear.

Using s-Dashboard to show streaming data in dashboard.

Now that you’ve created a view with speeding buses and their latitude/longitude, you can create a dashboard to display these buses on a map.

Dashboards are web pages that contain multiple panels, each of which can connect to a different stream, view, or table. Each panel contains a visualization. These are flexible modes of viewing your data, including simple tables, points on a map, line plots, bar graphs, area maps, and so on.

These all use column-row combinations to plot data. Panels can be changed in terms of both layout and data input. Dashboards will be most useful for streaming data, as you will be able to see data changing in real time.

The dashboard below, for example, shows dials that display various levels of failed login events for a bank.

By using adjustable panels, s-Dashboard lets you view multiple such objects at once. Each dashboard can be laid out with multiple panels, in combinations that you can change by adjusting panel layout.

Creating the Pan and Zoom Map Dashboard

In the following exercise, we’ll create a map dashboard. This type of dashboard takes values for latitude (lat) and longitude (lon) as input, and uses these values to display locations on a map.

To do so:

  1. Open a dashboard and navigate to http://localhost:5595/dashboards
  2. Click the New Dashboard button.
  3. Enter a name for the new dashboard and click Create.
  4. Click the start button to open the new dashboard.
  5. Click Select Input.
  6. In the window that opens, you should see “buses” available as a schema.
  7. Click “buses” to expand it. You should see foreign streams and views listed.
  8. Click Views to expand it.
  9. Select _speedersmap.
  10. Click Select.
  11. Click Select Panel in the upper right hand corner.
  12. In the window that opens, click Geo to expand it.
  13. Select Pan & Zoom Map.
    A map opens. You should see activity in Australia.
  14. Zoom in on Australia by either clicking the + icon in the upper left hand corner of the map or using your mouse wheel.
  15. Click the gear icon to open panel preferences.
  16. Make the following changes: Change diameter formula to 25.
  17. Change Event Label to Speed:<%=speed>. This refers to the “speed” column in the speeders_map view.
  18. Click Update. You should see red circles on the map indicating speeding buses. You can click on these to see their speed.

Tutorial 2B: Coding Streaming SQL in SQLstream sqlLine

SQLstream s-Server lets you query streaming data by using SQL (Structured Query Language), the same language that developers have been using to query traditional databases for decades.

This tutorial describes the steps for setting up a log file source and analyzing it, by coding blocks of SQL. You can execute SQL directly through a command-line program called sqlLine*. sqlLine executes SQL code against SQLstream s-Server. A version of sqlLine ships with s-Server, and comes pre-configured to connect with s-Server.

The tutorial includes the following steps:

Overview of Streaming SQL

The type of SQL that runs in s-Server is called streaming SQL. This SQL is based on the SQL standard, with some modifications. s-Server’ streaming SQL is described in the SQLstream Streaming SQL Reference Guide. SQLstream’s main enhancement to the SQL standard concerns the STREAM object. The process of creating streams in streaming SQL is similar to the process of creating tables in a database system like PostgreSQL or Oracle. Like database tables, streams have columns with column types. Once you create a stream, you can query it with a SELECT statement, or insert into it using an INSERT statement. A block of SQL that creates a stream looks something like the following.

CREATE OR REPLACE SCHEMA "buses_schema";
SET SCHEMA '"buses_schema"';

CREATE OR REPLACE FOREIGN STREAM "buses_stream"
(   "id" DOUBLE,
   "reported_at" VARCHAR(4096),
   "shift_no" DOUBLE,
   "trip_no" DOUBLE,
   "route_variant_id" DOUBLE,
   "waypoint_id" DOUBLE,
   "last_known_location_state" VARCHAR(4096),,
   "lat" DOUBLE,
   "lon" DOUBLE,
   "speed" DOUBLE,
   "bearing" VARCHAR(4096),
   "driver_no" VARCHAR(4096),
   "prescribed" VARCHAR(4096),
   "highway" VARCHAR(4096),
   "created_at" VARCHAR(4096),
   "updated_at" VARCHAR(4096),
)
   SERVER "Buses_Server"
;


Note that you declare types for each column. Data types determine what kind of data a column can contain. For example, a VARCHAR() column contains a character string of variable length.

## Connecting to a Data Source The first step in nearly all s-Server data analysis is to get data flowing into s-Server. As in previous tutorials, we’re going to use a simulated log file located in the /tmp* directory of the Linux machine running s-Server. This is the file written to by the Buses demo.

Start the Buses Demo

Open the SQLstream desktop folder and double-click theStream Bus Data icon.

Once you click the application’s icon, an information window opens.
Keep this window open for the remainder of the tutorial. As long as this window is open, sample data from buses in the Sydney area flows into a file located at/tmp/buses.log* at 50 rows per second. This simulates a log file being continually updated. This file features data in the following categories:

Logical column in file Category Definition
2 reported_at Time location was reported.
4 trip_no Trip number for the bus.
6 waypoint_id ID number for bus waypoint.
8 lat Latitude of location.
10 speed Reported speed of bus.
12 driver_no Driver identification for number.
14 highway Highway number, if available.
16 updated_at Timestamp for when this bus last reported a location.

In the following exercise, you’ll use SQL to set up a data source and analyze it.

Opening sqlLine

First, you’ll need to open sqlLine. To start sqlLine:

  • Open the desktop folder that installs with s-Server.
  • Double-click the sqlLine icon.

A terminal window opens. You can enter SQL at the command prompt.

There are two main ways to run SQL in sqlLine:

  • By creating a SQL file and using the!run* command. This command lets you access SQL scripts on your computer.
  • By directly typing SQL into the sqlLine console.

In the following tutorial, you will type SQL directly into the sqlLine console.

Create a SERVER Object

In the following steps, we’re going to create a SERVER object. A server object is stored in the s-Server repository, meaning that once you create it, it will be available for reuse. Server objects contain all the information necessary to connect to a data source, which could be an external database, a Kakfa or AMQP topic, a log file, a network feed, or any source for column-row pairs.

All SERVER objects use a data wrapper. Data wrappers provide access from within SQLstream s-Server to an external system. In this case, we’ll use a built-in data wrapper called ECDA. This data wrapper uses SQLstream’s Extensible Common Data Adapter. For more information on this adapter, see the topic Reading Data in the Integrating with Other Systems guide.

In the OPTIONS clause for the server, you define specific information on how s-Server interacts with the data source. Options appear in the table below.

Option Description
FORMAT_TYPE File format to be processed by adapter. Possible values are CSV, XML, JSON, Custom
ENCODING Any Java Supported Encoding. Use encoding implemented by java.nio, see:
http://download.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html?toc=0
DIRECTORY Directory in which file resides or to which you are writing.

In this case, the SERVER object options contain all the information necessary to access a log file. Some of the important options include the following: format_type ‘CSV’ tells s-Server to parse the log file as a comma-separated values file. separator ‘,’ tells s-Server to look for commas when separating the file into columns. directory ‘/tmp’ tells s-Server to look for the file in a directory called “/tmp” on the current machine. filename_pattern ‘buses.log’ tells s-Server to look for a file that matches the regular expression “buses.log’“. Here you want “buses.log” exactly. In Java regular expressions, a period has a special meaning:“all characters.” As a result, you need to preface the period with a backslash, which tells s-Server that you mean a period, and not all characters.

To create a SERVER object:

  1. Enter the following code into the SQLline prompt:
CREATE OR REPLACE SERVER "BusFileReaderServer"
FOREIGN DATA WRAPPER ECDA
OPTIONS (classname 'com.sqlstream.aspen.namespace.common.FileSetColumnSet',
parser 'CSV',
character_encoding 'UTF-8',
separator ',',
skip_header 'false',
directory '/tmp',
filename_pattern 'buses\.log'
);
  1. Press Enter.
  2. You should receive a message along the following lines:
No rows affected (0.553 seconds)

The SERVER object is now available for use in s-Server.

Creating a Foreign Stream to Connect with a Log File

The most important concept for Streaming SQL is the stream. A stream is a continually updating data object. A stream is like a table with no end, but that begins when the stream was established. As s-Server intakes information, new rows are continually added to the stream.

Streams can be written to by multiple writers and read from by multiple readers.

When you run a SELECT query on a conventional database table, the query iterates through the result set until there are no more rows to return. But when you run a SELECT query on an s-Server stream, there’s no end of rows. Instead, the “get next row” call continues to run in s-Server until the statement is closed by the client application. In complex systems, this open-ended SELECT needs to be managed in order to maximize performance. Here, though, we can just let it run.

The following code block creates a stream that references the server object “BusFileReaderServer”. Remember, in the server object, we defined how to connect with the data source, which in this case is a log file on a local machine. The stream is a virtual object with the columns listed below. When you query the stream, it splits the queried log file into columns based on the separator.

Enter the following code into the SQLline prompt:

CREATE OR REPLACE SCHEMA "buses";
SET SCHEMA '"buses"';
CREATE OR REPLACE FOREIGN STREAM
"buses_stream"("id" DOUBLE, --Identification number for the bus.
"reported_at" TIMESTAMP, --Time location was reported.
"shift_no" DOUBLE, --Shift number for the bus's driver.
"trip_no" VARCHAR(4096), --Trip number for the bus.
"route_variant_id" VARCHAR(4096), --ID number for bus route.
"waypoint_id" VARCHAR(4096), --ID number for bus waypoint.
"last_known_location_state" VARCHAR(4096), --Location state.
"lat" VARCHAR(4096), --Latitude of location.
"lon" VARCHAR(4096), --Longitude of location.
"speed" DOUBLE, --Reported speed of bus.
"bearing" VARCHAR(4096), --Navigational bearing for bus.
"driver_no" DOUBLE, --Driver identification for number.
"prescribed" VARCHAR(4096), --The direction on the motorway,                           --(into Sydney or out of Sydney).
"highway" DOUBLE, --Highway number, if available.
"created_at" TIMESTAMP, --Timestamp for when this bus                       --started reporting locations.
"updated_at" TIMESTAMP --Timestamp for when this bus                     --last reported a location.
)
SERVER "BusFileReaderServer"--Server created in the previous step.--Provides connection information for the log file.;

All streams must be created within schemas. A schema is lets you logically group s-Server objects, such as streams, tables, and views.

Once the stream is created, you can query it as you would a table, using code along the following lines:

SELECT STREAM * from "buses_stream";

Note that the query uses the STREAM keyword. You need to use this keyword whenever you query a stream. We will use the STREAM keyword throughout the rest of this tutorial.

Promoting a Column to Rowtime

In working with streaming data, you need to be aware of what time data arrives. Because streams continually update, and may update from multiple sources, time is an important concept in Streaming SQL. Time in streams ismonotonic*, meaning it always goes forward.

Working with ROWTIME

This monotonically increasing time is tracked as a column value called ROWTIME. By default, ROWTIME is the time a row enters the stream, though you can also configure the system to assign this value to a time generated by the data source.

This will help you produce meaningful analysis about this data. Every streaming row carries a time value called a rowtime, implemented as a column in every row. The rowtime for newly arriving rows cannot be less than the rowtime for previously received rows (though it can be equal to the rowtime of the current row). Rowtimes can be implicit or explicit. Implicit rowtimes are established by the “arrival time” of the row: the time that s-Server receives the row. Even though there is no explicit mention of ROWTIME, it is nevertheless part of that row:

INSERT INTO logStream VALUES('test', 'message1');

Explicit rowtimes are provided by the source application withINSERT*, as in the following examples. In each case, a timestamp from the source application is explicitly assigned to ROWTIME.

INSERT INTO logStream (ROWTIME, source, message) <br/>
VALUES(TIMESTAMP '2014-09-30 19:13:00', 'test', 'message1');

INSERT INTO logStream (ROWTIME, source, message) <br/>
VALUES(LOCALTIMESTAMP, 'test', 'message1');

Note: When setting an explicit ROWTIME, TIMESTAMP must be monotonically increasing from the previous TIMESTAMP. In the example above, ‘2014-09-30 19:13:00’ needs to be later than the previous TIMESTAMP.

In either case, the ROWTIME of the arriving row establishes the current time of the stream, known as the stream clock.

Creating a VIEW on the Foreign Stream

Now that we’ve created a foreign stream, and promoted a column to ROWTIME, we want to perform a basic calculation on this stream. We’ll use a VIEW to do this as well.

We’ll incorporate a WHERE clause into the new view.

The WHERE clause extracts records that meet a specified condition. The condition can be a numeric or string comparison, as in the following examples

SELECT STREAM FROM Sales WHERE Employee='Bob';
SELECT STREAM FROM Sales WHERE Customer_Type=1;
SELECT STREAM FROM Sales WHERE Customer_Type<1;

You can use the following operators in a WHERE clause. For more information on operators and the WHERE clause, see SQLstream Streaming SQL Reference Guide.

Here, we’ll apply the WHERE clause to the “speeders” column, in order to limit results to those buses going faster than 75 km/hour. Enter the following into SQLline:

CREATE or REPLACE VIEW "buses"."speeders" AS SELECT STREAM FROM "buses"."buses_with_rowtime" WHERE "speed" > 75;

This creates a VIEW with all the columns of “buses_with_rowtime” in which the column “speed” has a value greater than 75.

Next, we’ll create a more specific VIEW tailored to the needs of a map dashboard. This VIEW applies the same WHERE statement, but selects only “lat” “lon” and “speed”. We’ll use this VIEW in the next step of the tutorial to place the location of speeding buses on a map.

CREATE or REPLACE VIEW "buses"."speeders_map" AS SELECT STREAM "lat", "lon", "speed" FROM "buses"."buses_with_rowtime" WHERE "speed" > 75;

Using s-Dashboard to Show Streaming Data in a Dashboard

Now that you’ve created a view with speeding buses and their latitude/longitude, you can create a dashboard to display these buses on a map.

Dashboards are web pages that contain multiple panels, each of which can connect to a different stream, view, or table. Each panel contains a visualization. These are flexible modes of viewing your data, including simple tables, points on a map, line plots, bar graphs, area maps, and so on. These all use column-row combinations to plot data. Panels can be changed in terms of both layout and data input. Dashboards will be most useful for streaming data, as you will be able to see data changing in real time.

The dashboard below, for example, shows dials that display various levels of failed login events for a bank.

By using adjustable panels, s-Dashboard lets you view multiple such objects at once. Each dashboard can be laid out with multiple panels, in combinations that you can change by adjusting panel layout.

Creating the Pan and Zoom Map Dashboard

In the following exercise, we’ll create a map dashboard. This type of dashboard takes values for latitude (lat) and longitude (lon) as input, and uses these values to display locations on a map.

To do so:

  1. Open a dashboard and navigate to http://localhost:5595/dashboards
  2. Click the New Dashboard button.
  3. Enter a name for the new dashboard and click Create.
  4. Click the start button to open the new dashboard.
  5. Click Select Input.
  6. In the window that opens, you should see “buses” available as a schema.
  7. Click “buses” to expand it. You should see foreign streams and views listed.
  8. Click Views to expand it.
  9. Select speeders_map.
  10. Click Select.
  11. Click Select Panel in the upper right hand corner.
  12. In the window that opens, click Geoto expand it.
  13. Select Pan & Zoom Map*.
    A map opens. You should see activity in Australia.
  14. Zoom in on Australia by either clicking the + icon in the upper left hand corner of the map or using your mouse wheel.
  15. Click the gear icon to open panel preferences.
  16. Make the following changes: Change diameter formula to 25.
  17. Change Event Label to Speed:<%=speed>. This refers to the “speed” column in the _speedersmap view.
  18. Click Update.

You should see red circles on the map indicating speeding buses. You can click on these to see their speed.