DATA_RATE_RPS

New in SQLstream s-Server 7.3.0.

DATA_RATE_RPS is a stream function (UDX) can be used to measure the number of rows and rowtime bounds being emitted by any source foreign stream, native stream or streaming view in each second.

As well as counting rows and rowtime bounds, the function also reports the lowest and highest rowtime bounds received during that second, and the cumulative total of rows received since the query started.

Like all functions defined in SYS_BOOT.MGMT, this function is always in your PATH so you do not need to include the catalog and schema names when calling it.

This topic includes the following sections:

Examples Using DATA_RATE_RPS

These examples depend on the Sydney Buses StreamLab Gallery App. See Streaming data from Sydney Buses for more detail.

Reading from a FOREIGN STREAM

The first example reads data as it is ingested from test data file. Rowtimes reported are implicit (based on system time):

select stream * 
from stream(
    data_rate_rps
      (cursor (select stream rowtime,"id" from "StreamLab_Output_buses"."source_1_fs")
    ));
'ROWCOUNT','BOUNDS','TIMEOUTS','CUMULATIVE_ROWS','ROWTIME_LOW','ROWTIME_HIGH'
'0','1','0','0','2021-05-21 17:25:51.944','2021-05-21 17:25:51.944'
'63','2','0','63','2021-05-21 17:25:51.964','2021-05-21 17:25:51.993'
'30167','20','0','30230','2021-05-21 17:25:51.993','2021-05-21 17:25:53.063'
'10','28','0','30240','2021-05-21 17:25:53.116','2021-05-21 17:25:54.066'
'10','25','0','30250','2021-05-21 17:25:54.123','2021-05-21 17:25:55.066'
'10','24','0','30260','2021-05-21 17:25:55.118','2021-05-21 17:25:56.09'
'9','20','0','30269','2021-05-21 17:25:56.14','2021-05-21 17:25:57.003'
'2','22','0','30271','2021-05-21 17:25:57.003','2021-05-21 17:25:58.044'
'9','26','0','30280','2021-05-21 17:25:58.044','2021-05-21 17:25:59.056'
'10','25','0','30290','2021-05-21 17:25:59.106','2021-05-21 17:26:00.082'
'10','23','0','30300','2021-05-21 17:26:00.132','2021-05-21 17:26:01.086'
'10','23','0','30310','2021-05-21 17:26:01.142','2021-05-21 17:26:02.055'

This second example reads data from the sink at the end of the ingestion pipeline. Rowtimes reported are explicit - promoted from the “reported at” column - so are based in 2014. Because there has been a rowtime promotion, there are no rowtime bounds in this stream.

select stream * 
from stream(
    data_rate_rps
      (cursor (select stream rowtime,"id" from "StreamLab_Output_buses"."sink_1")
    ));
'ROWCOUNT','BOUNDS','TIMEOUTS','CUMULATIVE_ROWS','ROWTIME_LOW','ROWTIME_HIGH'
'0','1','0','0','2014-07-23 20:59:31.557','2014-07-23 20:59:31.557'
'4','0','0','4','2014-07-23 20:59:31.56','2014-07-23 20:59:31.69'
'3','0','0','7','2014-07-23 20:59:31.753','2014-07-23 20:59:31.82'
'6','0','0','13','2014-07-23 20:59:31.847','2014-07-23 20:59:31.967'
'6','0','0','19','2014-07-23 20:59:31.97','2014-07-23 20:59:32.063'
'6','0','0','25','2014-07-23 20:59:32.067','2014-07-23 20:59:32.123'
'5','0','0','30','2014-07-23 20:59:32.147','2014-07-23 20:59:32.267'
'3','0','0','33','2014-07-23 20:59:32.287','2014-07-23 20:59:32.467'
'5','0','0','38','2014-07-23 20:59:32.473','2014-07-23 20:59:32.587'
'1','0','0','39','2014-07-23 20:59:32.643','2014-07-23 20:59:32.643'

In both the examples above, the pipeline has been throttled so the row rates are deliberately low.

Overheads of DATA_RATE_RPS

In the examples above we show two cases:

  1. Reading from a foreign stream; this creates a new instance of the plugin and reads from the source (a file in this case).

    • If you already have an executing pipeline reading from the foreign stream, this means you will now have two independent readers.
    • Your session’s performance may differ from the performance of the pipeline; and the work your session is doing may make your executing pipeline slower.
  2. Reading from a native (SQLstream) stream; the work done to put the data into the stream is being done anyway; the only extra work is for your sqllineClient to read rows from the stream and aggregate the results.

    • The throughput you measure should be close to the performance of the pipeline running without being monitored.

If you use DATA_RATE_RPS to measure a streaming view, then as with case 1 above the planner creates a new stream graph all the way back to the nearest upstream stream(s) and/or source foreign stream(s). If you are already running a pipeline that reads from the view, you now have two copies of it running, competing with each other.

Looking for Performance Bottlenecks

It is valid and helpful to use DATA_RATE_RPS during development and performance testing to identify which steps in a pipeline are bottlenecks.

Starting with the source of the pipeline (and usually with any downstream pumps disabled):

  • measure the source (stream or foreign stream)
  • measure pipeline views one by one
  • record the throughput at each stage

Notes

  • Wherever you see a big drop in throughput, that step is likely to be worth investigating.
  • Do remember though that one acceptable reason for a drop in throughput is filtering data (either in a WHERE clause, or in an INNER JOIN). The step that filters out 20% of rows, even if the filtering costs nothing, would show a 20% drop in throughput.

Diagnosing Processing Delays By Counting Rowtime Bounds

Some problems can be caused by the absence of punctuation - implemented as rowtime bounds in SQLstream. Even when no rows are flowing, DATA_RATE_RPS shows whether stream rowtime is being updated by the presence of rowtime bounds. This helps in resolving problems relating to low data rates especially with merging streams (UNION ALL or multiple pumps into the same stream), with stream/stream joins, and with t-sorts.

For more information see Rowtime Bounds and Troubleshooting Processing Delays in the Concepts Guide.