Analytic Functions

Analytic functions return a result calculated from data in (or about) a finite set of rows identified by a window specification, rather than from data in or about a single row.

SQLstream s-Server supports the following analytic functions:

You use analytic functions similarly to aggregate functions. The differences are that

  • Analytic functions must specify a window. Since there are a few restrictions on window specifications, and a few differences between specifying windows for windowed aggregation and windowed join, please see - Allowed and Disallowed Window Specifications* for explanations.
  • Analytic functions may only appear in the <selection list> portion of a SELECT clause or in the ORDER BY clause.

Performing queries using analytic functions is commonly referred to as windowed aggregation (discussed below), as distinct from streaming aggregation.

Both are discussed in the topic Application Design Considerations in the s-Server Concepts Guide. Stream-to-stream joins are discussed in the topic Joins in this guide.

Time-based and row-based windows are discussed in the Glossary.

Because of the presence of the window specification, queries that use analytic functions produce results in a different manner than do aggregate queries. For each row in the input set, the window specification identifies a different set of rows on which the analytic function operates. If the window specification also includes a PARTITION BY clause, then the only rows in the window that will be considered in producing a result will be those that share the same partition as the input row.

If an input row contains a null in a column used as an input to an analytic function, the analytic function ignores the row, except for COUNT, which does count rows with null values. In cases where the window (or in the case of a PARTITION BY, a partition within the window) contains no rows, an analytic function will return null.

The exception to this is COUNT, which returns zero.

Differences between aggregate and analytic functions

Function Type Outputs Rows or Windows Used Notes
Aggregate Functions One output row per group of input rows All output columns are calculated over the same window or same group of rows. COUNT DISTINCT is not allowed instreaming aggregation; i.e., statements of the following type are not allowed:SELECT COUNT(DISTINCT x) … FROM … GROUP BY …
Analytic Functions One output row for each input row. Each output column may be calculated using a different window or partition. COUNT DISTINCT cannot be used as an analytic function and in windowed aggregation.

Analytic Functions Syntax Charts

The following syntax diagrams illustrate the sequence of commands and clauses constituting the usage of analytic functions

Links to related charts:

  • The list of - analytic functions that SQLstream supports* appears below.
  • The SELECT topic explains the order-by clause, showing the - order-by chart*, as well as the windowing clause (and - window-specification chart*).
  • To see where an order-by clause is used in Select statements, see the - Select chart* in the SELECT topic of this guide.

Windowed aggregation on streams and rowtime bounds

For time-based windows on streams, analytic functions are normally only able to identify the complete set of rows (and hence calculate a result) once a row arrives that falls outside the later bound of the current window. For example, in the case of a window that is RANGE INTERVAL ‘1’ MINUTE FOLLOWING, the arrival of a row with a rowtime more than one minute later than that of the row for which the window is being evaluated would indicate that there will be no more rows in the window, and the query can return a result. Alternately, a rowtime bound can be used to indicate that no more rows will arrive within a given window, enabling the query to return a result. For row-based windows on streams, such as ROWS 3 PRECEDING, rowtime bounds have no effect on windows (and hence, on the evaluation of analytic functions).

See below for examples of rowtime bounds in windowed aggregation, and the Concepts Guide for more information on rowtime bounds in general.

Examples: Windowed aggregation on tables

Examples: Windowed aggregation on tables

Assume the following data in a table of stock quotes:

ID TICKER PRICE
1 AAA 20
2 BBB 40
3 CCC 60
4 AAA 30
5 BBB 32
6 CCC 70
7 AAA 40
8 BBB 24
9 CCC 60
10 AAA 50
11 BBB 20
12 CCC 50

Let’s say we wish to compute the rolling average of the last 3 quotes on a given ticker. This is accomplished by using the AVG analytic function over a partitioned window:

SELECT
TICKER,
AVG(PRICE) OVER W1 AS ROLLING_AVG
FROM QUOTES
WINDOW W1 AS( PARTITION BY TICKER ROWS 2 PRECEDING );

Result:

| TICKER | ROLLING_AVG |
| ------ | ----------- |
| AAA    | 20          |
| BBB    | 40          |
| CCC    | 60          |
| AAA    | 25          |
| BBB    | 36          |
| CCC    | 65          |
| AAA    | 30          |
| BBB    | 32          |
| CCC    | 63          |
| AAA    | 40          |
| BBB    | 25          |
| CCC    | 60          |

Note that the first two result rows for each ticker symbol are based on fewer than three preceding rows, while the last two for each ticker operate on exactly three rows.

Windowed Aggregation on Streams

Assume the following information flowing through the stream WEATHERSTREAM:

ROWTIME CITY TEMP
2020-11-01 01:00:00.0 Denver 29
2020-11-01 01:00:00.0 Anchorage 2
2020-11-01 06:00:00.0 Miami 65
2020-11-01 07:00:00.0 Denver 32
2020-11-01 09:00:00.0 Anchorage 9
2020-11-01 13:00:00.0 Denver 50
2020-11-01 17:00:00.0 Anchorage 10
2020-11-01 18:00:00.0 Miami 71
2020-11-01 19:00:00.0 Denver 43
2020-11-02 01:00:00.0 Anchorage 4
2020-11-02 01:00:00.0 Denver 39
2020-11-02 07:00:00.0 Denver 46
2020-11-02 09:00:00.0 Anchorage 3
2020-11-02 13:00:00.0 Denver 56
2020-11-02 17:00:00.0 Anchorage 2
2020-11-02 19:00:00.0 Denver 50
2020-11-03 01:00:00.0 Denver 36
2020-11-03 01:00:00.0 Anchorage 1

Let’s say we want to find the minimum and maximum temperature recorded in the 24-hour period prior to any given reading, globally, regardless of city. To do this, we define a window of RANGE INTERVAL ‘1’ DAY PRECEDING, and use it in the OVER clause for the MIN and MAX analytic functions:


SELECT STREAM ROWTIME, MIN(TEMP) OVER W1 AS WMIN_TEMP, MAX(TEMP) OVER W1 AS WMAX_TEMP
FROM WEATHERSTREAM
WINDOW W1 AS (
   RANGE INTERVAL '1' DAY PRECEDING
);

Results:

| ROWTIME               | WMIN_TEMP | WMAX_TEMP |
| --------------------- | --------- | --------- |
| 2020-11-01 01:00:00.0 | 29        | 29        |
| 2020-11-01 01:00:00.0 | 2         | 29        |
| 2020-11-01 06:00:00.0 | 2         | 65        |
| 2020-11-01 07:00:00.0 | 2         | 65        |
| 2020-11-01 09:00:00.0 | 2         | 65        |
| 2020-11-01 13:00:00.0 | 2         | 65        |
| 2020-11-01 17:00:00.0 | 2         | 65        |
| 2020-11-01 18:00:00.0 | 2         | 71        |
| 2020-11-01 19:00:00.0 | 2         | 71        |
| 2020-11-02 01:00:00.0 | 2         | 71        |
| 2020-11-02 01:00:00.0 | 2         | 71        |
| 2020-11-02 07:00:00.0 | 4         | 71        |
| 2020-11-02 09:00:00.0 | 3         | 71        |
| 2020-11-02 13:00:00.0 | 3         | 71        |
| 2020-11-02 17:00:00.0 | 2         | 71        |
| 2020-11-02 19:00:00.0 | 2         | 56        |
| 2020-11-03 01:00:00.0 | 2         | 56        |
| 2020-11-03 01:00:00.0 | 1         | 56        |

Now, let’s assume we want to find the minimum, maximum, and average temperature recorded in the 24 hour period prior to any given reading, broken down by city. To do this, we add a PARTITION BY clause on CITY to the window specification, and add the AVG analytic function over the same window to the selection list:

SELECT STREAM
ROWTIME,
CITY,
MIN(TEMP) over W1 AS WMIN_TEMP,
MAX(TEMP) over W1 AS WMAX_TEMP,
AVG(TEMP) over W1 AS WAVG_TEMP
FROM AGGTEST.WEATHERSTREAM
WINDOW W1 AS (
PARTITION BY CITY
RANGE INTERVAL '1' DAY PRECEDING
);

Results:

| ROWTIME               | WMIN_TEMP | WMAX_TEMP |
| --------------------- | --------- | --------- |
| 2020-11-01 01:00:00.0 | 29        | 29        |
| 2020-11-01 01:00:00.0 | 2         | 29        |
| 2020-11-01 06:00:00.0 | 2         | 65        |
| 2020-11-01 07:00:00.0 | 2         | 65        |
| 2020-11-01 09:00:00.0 | 2         | 65        |
| 2020-11-01 13:00:00.0 | 2         | 65        |
| 2020-11-01 17:00:00.0 | 2         | 65        |
| 2020-11-01 18:00:00.0 | 2         | 71        |
| 2020-11-01 19:00:00.0 | 2         | 71        |
| 2020-11-02 01:00:00.0 | 2         | 71        |
| 2020-11-02 01:00:00.0 | 2         | 71        |
| 2020-11-02 07:00:00.0 | 4         | 71        |
| 2020-11-02 09:00:00.0 | 3         | 71        |
| 2020-11-02 13:00:00.0 | 3         | 71        |
| 2020-11-02 17:00:00.0 | 2         | 71        |
| 2020-11-02 19:00:00.0 | 2         | 56        |
| 2020-11-03 01:00:00.0 | 2         | 56        |
| 2020-11-03 01:00:00.0 | 1         | 56        |

Examples: Rowtime Bounds and Windowed Aggregation

This is an example of a windowed aggregate query:

SELECT STREAM ROWTIME, ticker, amount, SUM(amount)
OVER (
PARTITION BY ticker
RANGE INTERVAL '1' HOUR PRECEDING)
AS hourlyVolume
FROM Trades

Because this is a query on a stream, rows pop out of this query as soon as they go in. For example, given the inputs:

Trades: IBM 10 10 10:00:00
Trades: ORCL 20 10:10:00
Trades.bound: 10:15:00
Trades: ORCL 15 10:25:00
Trades: IBM 30 11:05:00
Trades.bound: 11:10:00

the output will be:

Trades: IBM 10 10 10:00:00
Trades: ORCL 20 20 10:10:00
Trades.bound: 10:15:00
Trades: ORCL 15 35 10:25:00
Trades: IBM 30 30 11:05:00
Trades.bound: 11:10:00

The rows still hang around behind the scenes for an hour, and thus the second ORCL row output has a total of 35; but the original IBM trade falls outside the “hour preceding” window, and so is excluded from the IBM sum.

Windowed Aggregation Specifications

Definitions and examples appear in the Streaming Aggregation and Windowed Aggregation topic in this guide. The chart for windowed aggregation is next.

Syntax Chart for Windowed Aggregation

(To see where windowed aggregation fits into a SELECT statement, see the topic SELECT in this guide.)

Interval Clause

Example:

Some business problems seem to need totals over the whole history of a stream, but this is usually not practical to compute. However, such business problems are often solvable by looking at the last day, the last hour, or the last N records. Sets of such records are called windowed aggregates.

They are easy to compute in a stream database, and can be expressed in standard SQL as follows:

SELECT STREAM ticker,
avg(price OVER lastHour AS avgPrice,
max(price) OVER lastHour AS maxPrice
FROM Bids
WINDOW lastHour AS (
PARTITION BY ticker
RANGE INTERVAL '1' HOUR PRECEDING)

Note: The Interval_clause must be of an appropriate type:

  • An integer literal with ROWS
  • A numeric value for RANGE over a numeric column
  • An INTERVAL for a RANGE over a date/time/timestamp