This user-defined transfrom (UDX) applies a RANK() function to logical groups of rows and optionally delivers the group in sorted order
Applications of Group_Rank include the following:
Group Rank UDX can do the following actions:
This page includes information on the following subtopics:
The functional attributes and DDL are described in the sections that follow. Note that you need to use the STREAM keyword when you invoke the function if the function returns a streaming result.
This Group_Rank transform acts as follows:
drop schema test cascade;
create schema test;
set schema 'TEST';
set path 'TEST';
create stream AccessStream
(
pageId varchar(1000),
sourceURL varchar(1000)
);
CREATE FUNCTION group_rank(
c cursor,
rankByColumnName VARCHAR(128),
rankOutColumnName VARCHAR(128),
sortOrder VARCHAR(10),
outputOrder VARCHAR(10),
maxIdle INTEGER,
outputMax INTEGER)
returns table(c.*, groupRank INTEGER)
language java
parameter style system defined java
no sql
external name 'class com.sqlstream.plugin.grouprank.GroupRank.group_rank';
CREATE VIEW pageCounts1Min AS
SELECT STREAM
pageId, COUNT(*) AS hitCount
FROM AccessStream AS S
GROUP BY
FLOOR(S.ROWTIME TO MINUTE), pageId;
SELECT STREAM pageId, hitCount, groupRank
FROM STREAM
--STREAM keyword is necessary because UDX
--returns an infinite (streaming) result
(
group_rank
(
CURSOR( SELECT STREAM pageId, hitCount FROM pageCounts1Min ),
'HITCOUNT', 'GROUPRANK', 'desc', 'asc', 10, 5
)
);
Note: When a table function returns an infinite result, it needs to be wrapped with a STREAM(…) marker when invoked. s-Server will return an error otherwise.
pageId hitCount
condo 51
auto 25
books 200
CDs 202
DVDs 1000
Games 500
pageId hitCount groupRank
DVDs 1000 1
Games 500 2
CDs 202 3
books 200 4
Condo 51 5
Rows are buffered from the input cursor for each group, i.e., rows with the same rowtimes. Ranking of the rows is done either after the arrival of a row with a different rowtime (or when the idle timeout occurs). Rows continue to be read while ranking is performed on the group of rows with the same rowtime.
The outputMax parameter specifies the maximum number of rows to be returned for each group after ranks are assigned.
By default, group_rank supports column pass through, as the example illustrates by using c.* as the standard shortcut directing pass through of all input columns in the order presented. You can, instead, name a subset using the notation “*c.columName*”, allowing you to reorder the columns. However, using specific column names ties the UDX to a very specific input set while using the *c.** notation allows the UDX to handle any input set
The rankOutColumnName parameter specifies the output column used to return ranks. This column name must match the column name specified in the RETURNS clause of the CREATE FUNCTION statement.
The parameters to the UDX are:
c | CURSOR to streaming result set |
---|---|
rankByColumnName | String naming the column to use for ranking the group |
rankOutColumnName | String naming the column to use for returning the rank.This string must match the name of the groupRank column in RETURNS clause of the CREATE FUNCTION statement. |
sortOrder | Controls ordering of rows for rank assignment.Valid values are: - ‘asc’ - Ascending based on rank. - ‘desc’ - Descending based on the rank. |
outputOrder | Controls ordering of output. Valid values are: - ‘asc’ - Ascending based on rank. - ‘desc’ - Descending based on the rank. |
maxIdle | Time limit in milliseconds for holding a group for ranking.When maxIdle expires the current group is released to the stream. A value of zero indicates no idle time out. |
outputMax | Maximum number of rows the UDX will output in a given group.A value of 0 indicates no limit. |