Guavus SQLstream s-Server is a SQL-based system for processing high-volume data with low latency. It lets you quickly connect to a variety of sources and sinks, including log files, Kafka topics, AMQP messages, Kinesis streams, and relational databases. You use these sources to set up analytic pipelines built in SQL. Complex, time-sensitive transformations and analytics are simple to configure, and they execute continuously across multiple input data sources and write to multiple destinations.
As in RDBMS systems, you use SQL to manipulate data in s-Server. However, in contrast to traditional RDBMSs, which process static, stored data with repeated single-shot queries, in s-Server, data is flowing and queries are open-ended. While there are a number of key differences, if you have worked with SQL in an RDBMS context before, you will find that many procedures are similar in s-Server.
This page contains the following subtopics:
Streaming data are processed as a continuous flow. You can set up s-Server to monitor a continuously changing log file, or an ongoing network feed, or Kafka messages, and other systems that produce data in columns. s-Server parses such data in CSV, Avro, JSON, XML, key pair, and Google protocol buffers.
Streaming data are time-sensitive data. Regardless of where it originates, data will be represented as sequences of time-stamped messages. In s-Server, timestamps are called rowtimes. (By default, such timestamps are established when data enters s-Server, but you can also configure s-Server so that such timestamps reflect when data was collected).
A streaming query is a continuous, standing query that executes over streaming data. SQLstream s-Server processes data streams using familiar SQL relational operators, such as the SELECT keyword, augmented to handle time sensitive data. Streaming queries can be event-driven and can aggregate over rolling or periodic time windows.
Streams operate in a continual present. The arrival of a row establishes the current time of the stream, informally called the “stream clock”.
Since streaming queries are typically time-sensitive, these rowtimes often determine when processing can proceed based on thresholds or aggregation criteria. Queries that process multiple streams of input can encounter greatly varying data arrival rates on those input streams.
SQLstream enables producers to publish rowtime bounds that increase efficiency in processing multiple streams that may produce data at varying rates. Publishing a rowtime bound promises that no subsequent row from this producer will have a rowtime earlier than the bound. This certainty frees queries and other processes to proceed with actions that might otherwise have waited to include such a row, that is, a row with a rowtime earlier than the now-known bound.
Because s-Server works with streaming data, you will generally process data across windows of time. Time windows vary depending on the application, from milliseconds to many hours, or even days. An application or user can use SQL to create a relational view over various message streams, transforming the data by applying relational operations such as aggregation, correlation, and filtering.
You will generally set up s-Server applications as pipelines, with data flowing in at one end (from a “source”), analyzed in the middle, and outputted at the end (to a “sink”). It’s also the case, though, that s-Server lets you create multiple views throughout the pipeline. You can set up queries so that different applications and users can each get their own customized view of the streaming data.
You can grasp the conceptual framework of s-Server by comparing it to the following information management systems:
s-Server complements an RDBMS. Both share a common data model centered on processing relational rows, queries, and views. They share common data manipulation and definition languages standardized as SQL. They share a common security model and APIs, such as JDBC, and a common representation of metadata. s-Server uses predetermined queries over arriving data, processing continuously and easy to maintain even during execution. An RDBMS is used for ad hoc queries over historical data, processing each query until it terminates.
The two work well together. s-Server can use static predetermined queries to preprocess data for an RDBMS and also respond to incoming messages by triggering dynamic queries on the data stored in an RDBMS. Queries in s-Server typically are scoped over explicit time windows based on business rules. The business rules typically specify time windows measured in minutes or hours, but time windows over any duration from milliseconds to months are possible. Both s-Server and an RDBMS can be used for transaction processing.
There are many similarities between the SQLstream s-Server and a relational database (RDBMS). Most importantly, both use the industry-standard Structured Query Language (SQL).
The main differences are summarized in this table:
|Queries||Re-executed frequently||Live Queries; Active, continuous execution|
|Control||Application calls RDBMS.||SQLstream calls Application.|
Publish/subscribe (“pub/sub”) systems, such as Tibco RendezvousTM or the pub/sub domain of most JMS message-oriented middleware (MOM), are typically organized into a hierarchy of topics. The topics in that hierarchy offer a moderate amount of flexibility for content-based subscription. That is, the fields and metadata within each message can be used to determine which messages a subscriber will receive, based on a simple SQL, XPath or regular expression using that message content. Content-based subscriptions for different MOMs are implemented differently, rather than using a standard adhered to by all.
These pub/sub systems are not as flexible as the predicate-based filtering and routing offered by a relational SQL-based system. SQLstream’s streaming queries provide a more powerful subscription capability, as follows:
By starting with a top-level message stream, that is, the first in a cascade, a cascading network of SQL views in SQLstream can create the equivalent of a dynamic publish/subscribe topic tree:
Complex SQL can thus use multiple message attributes to create “calculated topics”, beyond any original topic list, or even create time-based “aggregated topics”, offering hourly or daily roll-ups of the published messages.
Since there is no automatic provision for storing messages if the subscriber is not connected, standard SQLstream subscriptions are non-durable in JMS terms.
An “enterprise service bus” (ESB) is typically designed around routing and delivery features. Messages sent on an ESB are usually received in the format in which they were sent, not unlike sending a package with Federal Express. You expect the package to arrive unchanged at its destination. In some cases, ESBs can perform a one-time transformation of the message content either as it is sent, or as it is delivered.
In contrast, a SQLstream application can transform messages as well as routing and delivering them. It accomplishes this through the pipeline of streams and views. The additional power of SQL views enables SQLstream to transform messages “on the wire” to match the needs of each receiver.
In other words, the same message published by system A can be received by systems B and C in two different formats. At the same time, a compliance view can be “watching” the same message traffic and logging audit data or sending alerts. It’s as if the single package you send with FedEx can be received in distinct variations by multiple recipients using different carriers, and you get a log of all the deliveries.
The diagram below shows the key elements in SQLstream s-Server’s system architecture, providing an overview of a SQLstream s-Server-based system in the context of external applications, data sources and sinks, databases, and other systems.
s-Server is the core component of Guavus SQLstream. It manages and executes SQLstream s-Server applications, parsing data from sources, running analytics, and formatting data as output. s-Server lets you read and write structured and unstructured data; create, update, query, merge, and delete streams; and apply analytics built in SQL and Java. s-Server is a Relational Data Stream Management System. This means that s-Server acts like a relational database, and uses SQL as an interface, but manages data streams alongside tables. Like database tables, streams store data in columns, but streams are effectively infinite. You work with these by declaring windows and running analytics on these.
StreamLab is a web application development environment that automatically generates streaming SQL, which are known as “StreamApps”. Using StreamLab’s graphical interface, you can set up input streams for data, apply built-in analytics to this data, and create sinks for data for output to external locations.
s-Dashboard is a browser based solution to enable business users and analysts to build, edit, and deploy real-time dashboards over streams, views, or tables. These streams, views, and tables may have been created in StreamLab, or otherwise generated in s-Server. Once deployed, these dashboards will update continuously with s-Server data.
The s-Server WebAgent functions as a simple, miniature Web server. It accepts certain HTTP requests which represent SQL queries, and send these queries to s-Server. The agent replies with the query results as JSON objects.
Adapters let you connect to foreign data sources and sinks. These fall into two categories:
Agents perform the same functions as adapters, but operate remotely. These are well-suited to pull in data from sensors, Internet of Things gateways, and other location-specific objects.