Group Rank

This user-defined transfrom (UDX) applies a RANK() function to logical groups of rows and optionally delivers the group in sorted order Applications of Group_Rank include the following:

  • To sort results of a streaming GROUP BY
  • To determine a relationship within the results of a group

Group Rank UDX can do the following actions:

  • Apply Rank to a specified input column.
  • Supply either sorted or non-sorted output.
  • Enable the user to specify a period of inactivity for data flush.

SQL Declarations

The functional attributes and DDL are described in the topics that follow.

Functional attributes for Group_Rank

This Group_Rank transform acts as follows:

  • Gathers rows until either a rowtime change is detected or a specified idle-time limit is exceeded.
  • Accepts any streaming rowset.
  • Uses any column with a basic SQL data type of INTEGER, CHAR, VARCHAR as the column by which to do the ranking
  • Orders the output rows either in the order received or in ascending or descending order of values in the selected column.

DDL for Group_Rank

CREATE FUNCTION group_rank(c cursor, rankByColumnName VARCHAR(128),
   rankOutColumnName VARCHAR(128), sortOrder VARCHAR(10), outputOrder VARCHAR(10),
   maxIdle INTEGER, outputMax INTEGER)
 returns table(c.*, "groupRank" INTEGER)
 language java
 parameter style system defined java
 no sql
 external name 'class com.sqlstream.plugin.grouprank.GroupRank.group_rank';

The parameters to the UDX are:

c CURSOR to streaming result set
rankByColumnName String naming the column to use for ranking the group
rankOutColumnName String naming the column to use for returning the rank. This string must match the name of the groupRank column in RETURNS clause of the CREATE FUNCTION statement.
sortOrder Controls ordering of rows for rank assignment.

Valid values are:
- ‘asc’ Ascending based on rank.
- ‘desc’ Descending based on the rank.
outputOrder Controls ordering of output.

Valid values are:
- ‘asc’ Ascending based on rank.
- ‘desc’ Descending based on the rank.
maxIdle Time limit in milliseconds for holding a group for ranking. When maxIdle expires the current group is released to the stream. A value of zero indicates no idle time out.
outputMax Maximum number of rows the UDX will output in a given group. A value of 0 indicates no limit.

Example of Group_Rank Use:

```sql   CREATE VIEW pageCounts1Min AS    SELECT STREAM       pageId, COUNT(*) AS hitCount    FROM AccessStream AS S    GROUP BY       FLOOR(S.ROWTIME TO MINUTE), pageId;    SELECT STREAM “pageId”, “hitCount”, “groupRank”    FROM (TABLE(group_rank(CURSOR(SELECT STREAM “pageId”, “hitCount”       FROM pageCounts1Min),         ‘hitCount’, ‘groupRank’, ‘desc’, ‘asc’, 10, 5)));

### Sample Input:

pageId hitCount
condo  51
auto  25
books  200
CDs  202
DVDs  1000
Games  500

Sample output:

pageId hitCount groupRank
DVDs  1000  1
Games  500  2
CDs  202  3
books  200  4
Condo  51  5

Operational Overview

Rows are buffered from the input cursor for each group, i.e., rows with the same rowtimes. Ranking of the rows is done either after the arrival of a row with a different rowtime (or when the idle timeout occurs). Rows continue to be read while ranking is performed on the group of rows with the same rowtime.

The outputMax parameter specifies the maximum number of rows to be returned for each group after ranks are assigned. By default, _grouprank supports column pass through, as the example illustrates by using c.* as the standard shortcut directing pass through of all input columns in the order presented. You can, instead, name a subset using the notation “c.columName”, allowing you to reorder the columns. However, using specific column names ties the UDX to a very specific input set while using the c.* notation allows the UDX to handle any input set

The rankOutColumnName parameter specifies the output column used to return ranks. This column name must match the column name specified in the RETURNS clause of the CREATE FUNCTION statement.