StreamLab’s Pivot analytics let you unfold and fold streams of data to make them simpler to process. (See http://en.wikipedia.org/wiki/Pivot_table for more information on pivot tables.) Because streams continually update, in working with streaming data, you always need to choose a window over which to query.
In other words it will turn this:
ROWTIME=12:00:00,type=Temperature,value=33 ROWTIME=12:00:00,type=Wind Speed,value=12
pivot operations are currently experimental. At the moment they will only work if there’s an input row containing each of the key values (the columns in the unfolded result) that arrives during the time window. The input should contain all of the keys each time window.
Unfold reshapes a table into columns of key-value sets. Selected rows map to keys, and selected columns map to values.
The unfold operation proceeds as follows.
Given a stream of the form:
| Rowtime | Browser | Count | |1.– |1.– |1.– | | 1:01 | IE | 110| | 1:03 | Chrome | 290| | 2:01 | IE | 90| | 2:03 | Chrome | 300 |
An unfold operation generates columns like the following:
| rowtime window | IE | Firefox | Chrome | Safari | |1.– |1.– |1.– |1.– | | <hour 1> | 110 | 230 | 290 | 100 | | <hour 2> | 90 | 220 | 300 | 90 |
In the example above, “user-agent” serves as the key column, and “count” serves as the value column. For each value in the key column, StreamLab creates a new column and fills in this column with values from the value column. Because you may have multiple values for each key, you need to use an operation to aggregate those values to produce a single value for the unfolded row. For example, if incoming values are counts (as in the example above), you’d want to SUM over the time window. If they’re measurements of a continuous value, like a current temperature reading from a sensor, you would likely want to use AVG.
As rows come in, the StreamLab scrutinizer watches the input stream and lists unique values in the key column (in this example, the user-agent column). Once you select the Unfold operation and pick user-agent as the key column, it will make a suggestion that fills in the ‘key list’ parameter with the list of unique values observed in the stream. You can then set the rest of the parameters either using the widgets up top or by selecting the fields in the suggestion. You can also enter the key list manually.
A window on rowtime is a common use case.
To implement an unfold operation:
A sliding window is a time-based or row-based analytic window. With a sliding window, you apply an analytic to a given amount of rows that changes incrementally as rows stream in. For example, you might take the sum of a column called Orders for the past hour. At 12:01, the analytic would apply to all rows with timestamps between 11:01 and 12:01. Sliding window queries do not contain a flooring function, which means that rows are not batched into intervals. The query’s results slide forward one row at a time. In other words, for every row that the query finds, it emits one row. This is an example of a sliding window in a query:
SELECT STREAM , SUM(amt) OVER w FROM sales WINDOW w AS (PARTITION BY store_id ORDER BY s.ROWTIME RANGE BETWEEN INTERVAL '1' HOUR PRECEDING AND CURRENT ROW);
You can specify sliding windows in two ways:
This analytic window uses the last N rows, such as the last 10 rows. When the query starts up, the window fills up to N rows. From then on, as a new row enters the window, an old row is excluded from the analytic. For all cases where N is a number, the number of rows in the analytic window will always be N+1. You can also specify ROWS UNBOUNDED PRECEDING, which includes all available previous rows in its analytic window.
This analytic window uses rows from a specified interval of time. In this case, all rows whose rowtimes precede the current rowtime by one hour will be excluded from the analytic. In other words, rows that precede the previous hour are “dumped” or “discarded” from the analytic window.
A “tumbling window” is a a collection of rows that are aggregated to produce a fewer number of output rows, such as “the sum of the last twenty rows” or “the sum of the rows in the last hour”. One row is returned for every group of rows.
As a result, the total number of output rows are limited: many rows in, one row out. Tumbling windows are actually not “windows” in the SQL sense, but rows that are aggregated with SELECT STREAM … GROUP BY.
Fold lets you convert multiple columns with like values into a single row. For fold, select multiple columns with like values (for example, a series of columns that contain counts for different categories, such as product types or employees). Fold creates two new columns: a key column with values drawn from the column names of selected columns, and a value column with values drawn from selected column values.
To implement a fold: