Group Rank UDX

This user-defined transfrom (UDX) applies a RANK() function to logical groups of rows and optionally delivers the group in sorted order

Applications of Group_Rank include the following:

  • To sort results of a streaming GROUP BY
  • To determine a relationship within the results of a group

Group Rank UDX can do the following actions:

  • Apply Rank to a specified input column.
  • Supply either sorted or non-sorted output.
  • Enable the user to specify a period of inactivity for data flush.

This page includes information on the following subtopics:

SQL Declarations

The functional attributes and DDL are described in the sections that follow. Note that you need to use the STREAM keyword when you invoke the function if the function returns a streaming result.

Functional attributes for Group_Rank

This Group_Rank transform acts as follows:

  • Gathers rows until either a rowtime change is detected or a specified idle-time limit is exceeded.
  • Accepts any streaming rowset.
  • Uses any column with a basic SQL data type of INTEGER, CHAR, VARCHAR as the column by which to do the ranking
  • Orders the output rows either in the order received or in ascending or descending order of values in the selected column.

DDL for Group_Rank

drop schema test cascade;

create schema test;
set schema 'TEST';
set path 'TEST';
create stream AccessStream
(
 pageId varchar(1000),
 sourceURL varchar(1000)
);

CREATE FUNCTION group_rank(
 c cursor,
 rankByColumnName VARCHAR(128),
 rankOutColumnName VARCHAR(128),
 sortOrder VARCHAR(10),
 outputOrder VARCHAR(10),
 maxIdle INTEGER,
 outputMax INTEGER)
 returns table(c.*, groupRank INTEGER)
 language java

 parameter style system defined java

 no sql

 external name 'class com.sqlstream.plugin.grouprank.GroupRank.group_rank';

Example of Group_Rank Use

CREATE VIEW pageCounts1Min AS
SELECT STREAM
pageId, COUNT(*) AS hitCount
FROM AccessStream AS S
GROUP BY
FLOOR(S.ROWTIME TO MINUTE), pageId;

SELECT STREAM pageId, hitCount, groupRank
FROM STREAM
--STREAM keyword is necessary because UDX
--returns an infinite (streaming) result
(
 group_rank
 (
   CURSOR( SELECT STREAM pageId, hitCount FROM pageCounts1Min ),
   'HITCOUNT', 'GROUPRANK', 'desc', 'asc', 10, 5
 )
);

Note: When a table function returns an infinite result, it needs to be wrapped with a STREAM(…) marker when invoked. s-Server will return an error otherwise.

Sample Input:

pageId hitCount
condo  51
auto  25
books  200
CDs  202
DVDs  1000
Games  500

Sample output:

pageId hitCount groupRank
DVDs  1000  1
Games  500  2
CDs  202  3
books  200  4
Condo  51  5

Operational Overview

Rows are buffered from the input cursor for each group, i.e., rows with the same rowtimes. Ranking of the rows is done either after the arrival of a row with a different rowtime (or when the idle timeout occurs). Rows continue to be read while ranking is performed on the group of rows with the same rowtime.

The outputMax parameter specifies the maximum number of rows to be returned for each group after ranks are assigned.

By default, group_rank supports column pass through, as the example illustrates by using c.* as the standard shortcut directing pass through of all input columns in the order presented. You can, instead, name a subset using the notation “*c.columName*”, allowing you to reorder the columns. However, using specific column names ties the UDX to a very specific input set while using the *c.** notation allows the UDX to handle any input set

The rankOutColumnName parameter specifies the output column used to return ranks. This column name must match the column name specified in the RETURNS clause of the CREATE FUNCTION statement.

UDX Parameters

The parameters to the UDX are:

c CURSOR to streaming result set
rankByColumnName String naming the column to use for ranking the group
rankOutColumnName String naming the column to use for returning the rank.This string must match the name of the groupRank column in RETURNS clause of the CREATE FUNCTION statement.
sortOrder Controls ordering of rows for rank assignment.Valid values are: - ‘asc’ - Ascending based on rank. - ‘desc’ - Descending based on the rank.
outputOrder Controls ordering of output. Valid values are: - ‘asc’ - Ascending based on rank. - ‘desc’ - Descending based on the rank.
maxIdle Time limit in milliseconds for holding a group for ranking.When maxIdle expires the current group is released to the stream. A value of zero indicates no idle time out.
outputMax Maximum number of rows the UDX will output in a given group.A value of 0 indicates no limit.