General Concepts

This topic outlines important concepts to understand in working with Guavus SQLstream.

It includes the following subtopics:

General Concepts

  • Catalog. A SQLstream catalog is the highest level repository object; it contains metadata and state information for the catalog objects (including streams, pumps and adapters) that comprise a SQLstream Repository. See SQLstream Objects below for the full set of object types.
  • Repository. A persistent store for the SQLstream Catalog and native table data. By default, this is implemented with an embedded database, but it can be implemented with any relational database.
  • JDBC Driver. The SQLstream JDBC driver, sometimes called the "client driver," enables external applications to connect to a SQLstream Control Node to establish a session for executing streaming SQL against a RAMP. Client applications can INSERT into streams or SELECT from streams or views. The driver uses SDP to communicate with the Control Node and RAMP.
  • Streaming Data Protocol. Streaming Data Protocol (SDP) is a protocol for efficiently transmitting rows between nodes in an instance of a SQLstream s-Server. SDP is also used for transmission of data between client and server, for example as part of INSERT EXPEDITED.
  • s-Server. The SQLstream runtime component that manages and executes SQLstream applications.
  • s-Studio. SQLstream's user interface provides an Integrated Development Environment (IDE) for defining SQLstream objects and developing and testing streaming applications and queries (the Development tab) and an integrated console for monitoring the active sessions and statements within a running SQLstream s-Server (the Management tab).
  • s-Dashboard. s-Dashboard is a browser-based interface for viewing s-Server schema objects: streams, views, or tables graphically. Dashboards are web pages that contain multiple panels, each of which can connect to a different stream, view, or table. Each panel contains a visualization.
  • StreamLab. StreamLab is a web application development environment that automatically generates streaming SQL, which are known as "StreamApps." Using StreamLab's graphical interface, you can set up input streams for data, apply built-in analytics to this data, and create sinks for data for output to external locations.

SQLstream Objects

The following sections briefly describe each of the object types supported by the SQLstream catalog.

Schema. A named catalog object containing the definitions of schema objects, such as streams, tables, views, procedures and user defined functions.

Schema Objects

The following objects must be defined within schemas:

  • Stream. A relation that does not store data, as a finite relation does. Instead, a stream implements a publish-subscribe protocol. It can be written to by multiple writers and read from by multiple readers.
  • Foreign Stream. A stream associated with a server, which itself is an instance of a foreign data wrapper to provide access within SQLstream to a flow of data either from or to an external system. Foreign streams can be sources--which read from external sources--or sinks--which write to external sources.
  • Foreign Table. A schema object that records the metadata necessary for SQLstream s-Server to be able to access a table (or similar data structure) in a remote database. A foreign table can be used in SQL (both queries and updates) just like a native table.
  • View. A relation that provides a reusable definition of a query. SQLstream supports views based on both streaming and non-streaming (finite) SELECT statements.
  • Pumps. A SQLstream repository object that provides a continuously runningINSERT INTO stream SELECT ... FROM queryfunctionality, thereby enabling the results of a query to enter a named stream.

Repository Objects

The following objects are defined within s-Server and do not need to be in a schema:

  • Foreign Data Wrappers. These "wrap" external data sources or sinks to make them look like part of s-Server. One common use of these in s-Server is "adapters," which are fundamentally pieces of Java code that plug in to the server and adapt outside data to make it look as if it were inside.
  • Server Object. A named catalog object that defines a particular named use of a foreign or local data wrapper, including as options any information needed to connect to the data source.* This could be an external database, the location of a network feed, or the location of a log file. Note: The SQL standard refers to this as a foreign server when using a foreign data wrapper.
  • User-Defined Routine. This is any user-defined program object, the generic term for a user-defined function, user-defined procedure or user-defined transform.

The following table describes the differences between user-defined functions, procedures, and transforms:

User-Defined Function (UDF)

SQL: CREATE FUNCTION

Accepts ... Zero or more parameters

Returns ... Scalar value

  • A UDF is used in a scalar expression; it takes (0 .. n) scalar arguments and returns a scalar value.
  • When used in a query, a scalar expression is evaluated for each row.
  • SQLstream allows the creation of Java or SQL functions.

User-Defined Procedure (UDP)

SQL: CREATE PROCEDURE

Accepts ... Zero or more parameters

Returns ... No values returned A UDP is just a UDF that returns no value; it is evaluated for a side effect.

User-Defined Transform (UDX)

SQL: CREATE FUNCTION

Accepts ... Zero or more Relations

Returns ... A Streaming Relation

  • A UDX is different: its output is a stream of rows, and its inputs can be scalars or be streams. In SQL,
  • a streaming argument to a UDX is represented as CURSOR(SELECT STREAM ...)
  • and a scalar is represented by any scalar expression.

The invocation of a UDX is quite often preceded by the keyword TABLE. For example:

SELECT STREAM *
 FROM TABLE(
  filterSignal( CURSOR( SELECT STREAM * FROM "RawSignals")));

Conceptual Analogies

You can grasp the conceptual framework of s-Server by comparing it to the following information management systems:

Relational Databases

There are many similarities between the SQLstream s-Server and a relational database (RDBMS). Most importantly, both use the industry-standard Structured Query Language (SQL).

The main differences can be summarized in this table:

Concept RDBMS SQLstream
Data Stored persistently Flowing
Query StaticRe-executed frequently Live QueriesActive, continuous execution
ExecutionControl Application calls RDBMS. SQLstream calls Application.

Publish/Subscribe Message-Oriented Middleware

Publish/subscribe ("pub/sub") systems --- such as Tibco RendezvousTM or the pub/sub domain of most JMS message-oriented middleware (MOM) --- are typically organized into a hierarchy of topics. The topics in that hierarchy offer a moderate amount of flexibility for content-based subscription. That is, the fields and metadata within each message can be used to determine which messages a subscriber will receive, based on a simple SQL, XPath or regular expression using that message content. Content-based subscriptions for different MOMs are implemented differently, rather than using a standard adhered to by all.

These pub/sub systems are not as flexible as the predicate-based filtering and routing offered by a relational SQL-based system. SQLstream's streaming queries provide a more powerful subscription capability, as follows:

By starting with a top-level message stream, that is, the first in a cascade, a cascading network of SQL views in SQLstream can create the equivalent of a dynamic publish/subscribe topic tree:

  • Publishers INSERT INTO ... the message stream through JDBC, or even through a JMS driver.
  • Subscribers *SELECT STREAM * FROM ...* the view representing the logical "topic," which can filter by tags in the message itself.
  • Unlike message-oriented middleware, SQLstream s-Server can join message streams.
  • This join capability allows SQLstream to filter content based on lookup information that is not directly contained in the message. Complex SQL can thus use multiple message attributes to create "calculated topics," beyond any original topic list, or even create time-based "aggregated topics," offering hourly or daily roll-ups of the published messages.

Since there is no automatic provision for storing messages if the subscriber is not connected, standard SQLstream subscriptions are non-durable in JMS terms.

Enterprise Service Bus

An "enterprise service bus" (ESB) is typically designed around routing and delivery features. Messages sent on an ESB are usually received in the format in which they were sent, not unlike sending a package with Federal Express. You expect the package to arrive unchanged at its destination. In some cases, ESBs can perform a one-time transformation of the message content either as it is sent, or as it is delivered.

In contrast, a SQLstream application can transform messages as well as routing and delivering them. It accomplishes this through the pipeline of streams and views. The additional power of SQL views enables SQLstream to transform messages "on the wire" to match the needs of each receiver. In other words, the same message published by system A can be received by systems B and C in two different formats. At the same time, a compliance view can be "watching" the same message traffic and logging audit data or sending alerts. It's as if the single package you send with FedEx can be received in distinct variations by multiple recipients using different carriers, and you get a log of all the deliveries.