ORDER BY clause

The ORDER BY clause allows you to specify the columns, or expressions, by which a relational (non-streaming) query is sorted. If you do not specify an ORDER BY clause, the order of the rows is unspecified.

ORDER BY is also supported for streaming queries (see Streaming Order By below) and, by using a SQLstream keyword, can be used to time-sort rows in a time-based window. See T-sorting Stream Input below for more details.

This topic contains information on teh following:

Syntax Chart for the ORDER BY Clause

To see where this clause fits, see the SELECT chart in the topic SELECT statement in this guide.

Note: For the use and effect of the t_sort_clause, see T-sorting Stream Input below.

The ORDER BY clause is evaluated after FROM, WHERE, GROUP BY, HAVING, and SELECT clauses. As a result, expressions that are filtered away by a GROUP BY or SELECT DISTINCT operation cannot be sorted on.

For example, it is illegal to write

SELECT DISTINCT deptno
FROM emp
ORDER BY gender

because for a given value of deptno there is no single well-defined value of gender.

If an expression has been given a column alias in the SELECT clause, you can refer to it by that alias.

If the expression is an integer literal n, it means the nth entry in the SELECT clause.

For example, the following three queries are equivalent:

SELECT orderId, unitPrice * quantity AS total
FROM Orders
ORDER BY 2;
SELECT orderId, unitPrice * quantity AS total
FROM Orders
ORDER BY total;
SELECT orderId, unitPrice * quantity AS total
FROM Orders
ORDER BY unitPrice * quantity;

Streaming Order By

A streaming query can use ORDER BY if its leading expression is time-based and monotonic. For example, a streaming query whose leading expression is based on the ROWTIME column can use ORDER BY to do the following operations:

  • Sort the results of a streaming GROUP BY.
  • Sort a batch of rows arriving within a fixed time window of a stream.
  • Perform streaming ORDER BY on windowed-joins.

The “time-based and monotonic” requirement on the leading expression means that the query

SELECT STREAM DISTINCT ticker FROM trades ORDER BY ticker;

will fail, but the query

SELECT STREAM DISTINCT rowtime, ticker FROM trades ORDER BY ROWTIME, ticker;

will succeed.

Streaming ORDER BY sorts rows using SQL standard compliant syntax for the ORDER BY clause. It can be combined with a UNION ALL statement, and can sort on expressions, such as:

SELECT STREAM x, y FROM t1
UNION ALL
SELECT STREAM a, b FROM t2 ORDER BY ROWTIME, MOD(x, 5)

The ORDER BY clause can specify ascending or descending sort order, and can use column ordinals, as well as ordinals specifying (referring to) the position of items in the select list.

Streaming ORDER BY SQL Declarations

The streaming ORDER BY clause includes the following functional attributes:

  • Gathers rows until the monotonically increasing expression in streaming ORDER BY clause does not change.
  • Does not require streaming GROUP BY clause in the same statement.
  • Can use any column with a basic SQL data type of TIMESTAMP, DATE, DECIMAL, INTEGER, FLOAT, CHAR, VARCHAR.
  • Does not require that columns/expressions in the ORDER BY clause be present in the SELECT list of the statement.
  • Applies all the standard SQL validation rules for ORDER BY clause.

Streaming ORDER BY can include the pseudo-column ROWTIME. As a pseudo-column, ROWTIME behaves as a stream column, but is not stored in the stream.

Note: Unadorned ROWTIME (not qualified with a stream name) can only be specified in the SELECT list or the ORDER BY clause.

The following query is an example of streaming ORDER BY:

SELECT STREAM state, city, SUM(amount)
FROM orders
GROUP BY FLOOR(ROWTIME TO HOUR), state, city
ORDER BY FLOOR(ROWTIME TO HOUR), state, SUM(amount);

T-sorting Stream Input

SQLstream s-Server’s real-time analytics use the fact that arriving data is ordered by ROWTIME. However, sometimes data arriving from multiple sources may not be time-synchronized.

While SQLstream s-Server can sort data from individual data sources that have been independently inserted into s-Server’s native stream, in some cases such data may have already combined from multiple sources (such as for efficient consumption at an earlier stage in processing). At other times, high volume data sources could make direct insertion impossible.

In addition, an unreliable data source could block progress by forcing s-Server to wait indefinitely, unable to proceed until all connected data sources deliver. In this case, data from this source could be unsynchronized.

SQLstream s-Server resolves these issues using an execution object (XO), the t-Sort XO. It uses a sliding time-based window of incoming rows to reorder those rows by ROWTIME.

Syntax

You specify the time-based parameter for sorting and the time-based window in which the streaming rows are to be time-sorted, using the following syntax:

 ORDER BY <timestamp_expr> WITHIN <interval_literal>

Examples

The SQL statements below pre-sort rows from different streams using windows of two minutes and two seconds.

CREATE OR REPLACE STREAM "vehicle_positions" (
"PositionTime" TIMESTAMP,
"VID" INTEGER,
"vSpeed" SMALLINT,
"vBearing" REAL)
DESCRIPTION 'TSort Example';

SELECT STREAM "PositionTime" AS ROWTIME, VID, "vSpeed", "vBearing"
FROM "vehicle_positions"
ORDER BY "PositionTime" WITHIN INTERVAL '2' MINUTE, VID;
SET SCHEMA '"SALES"';

CREATE OR REPLACE STREAM "ASKS" (
"TransactionTime" TIMESTAMP,
"time" BIGINT,
"ticker" VARCHAR(5),
"shares" INTEGER,
"price" REAL,
"expiryMs" BIGINT,
"comment" VARCHAR(1024))
DESCRIPTION 'updating ASKS for this example';

SELECT STREAM "TransactionTime" AS ROWTIME, "ticker", "price"
FROM ASKS
ORDER BY "TransactionTime" WITHIN INTERVAL '2' SECOND;

Restrictions

The T-sort XO has the following restrictions:

  • The datatype of the ORDER BY expression must be timestamp.
  • The partially-ordered expression <timestamp_expr> must be present in the select list of the query with the alias ROWTIME.
  • The leading expression of the ORDER BY clause must not contain the ROWTIME function and must not use the DESC keyword.
  • The ROWTIME column needs to be fully qualified with a stream name. For example:
    • ORDER BY FLOOR(ROWTIME TO MINUTE),… fails.
    • ORDER BY FLOOR(s.ROWTIME TO MINUTE),… works.

If any of these requirements are not met, the statement will fail with errors. Additional notes:

  • You cannot use incoming rowtimebounds. These are ignored by the system in an ORDER BY context.
  • If <timestamp_expr> evaluates to NULL, the corresponding row is discarded.

Combining T-Sort with GROUP BY

It is also possible to combine a T-Sort with a GROUP BY, either explicitly, or implicitly as a result of a planner optimization.

Explicit T-Sort as part of GROUP BY

(This SQL feature has been available since version 7.2.4).

The T-Sort + GROUP BY can be explicitly stated in the GROUP BY clause - see Grouping by non-monotonic timestamp column

Implicit T-Sort combined with following GROUP BY

(This optimization has been in effect since version 7.0.4).

The two operations can be implicitly combined as a result of the SQLstream optimizer opportunistically combining a T-Sort operation with an immediately following GROUP BY operation.

If the planner recognizes that a SELECT STREAM … ORDER BY .. WITHIN view ia immediately followed by a SELECT STREAM … GROUP BY … it generates a plan that combines these.

This will only happen if the two XOs are in the same streaming graph. Normally that means they are both executed as part of the same pump.

Benefits of combining T-Sort and GROUP BY

These benefits apply equally to both explicit and implicit combination of T-Sort and GROUP BY.

Rather than a T-Sort XO saving every row into a queue, and then sending it to a second Streaming Aggregation XO performing the aggregation, the Streaming Aggregation XO handles both:

  • it reads the row
  • decides whether it is late (and if so discards it as before)
  • keeps track of the incoming stream’s high watermark and the WITHIN low watermark
  • adds the row into the appropriate aggregation bucket (rather than into a queue)
  • emits the aggregated rows for each GROUP BY time period as soon as the timestamp high watermark passes the end of the GROUP BY period plus the WITHIN interval (the same trigger we would have used for detail records).

If we compare the memory required:

  • A regular T-Sort uses memory based on the number of rows received during a “within period” of the t-sort.
  • The combined T-Sort + GROUP BY uses the memory for the number of aggregate rows required (normally a small fraction of the input rows) multiplied by the number of different periods that could be open for aggregation.
    • Because the WITHIN period slides, there will be times when for each partition key we are maintaining aggregation state for 2 or more time periods.
    • If the WITHIN period is longer than the GROUP BY PERIOD, there could be 3 or even more periods open for aggregation at once.
    • Even so, the memory required for aggregates is usually a small fraction of the memory required for detail rows

There is also a processing benefit:

  • there is no need to marshall/unmarshall detail rows into the buffer structure between the T-Sort and StreamAgg XOs.