Analytics

Analytics commands let you apply a number of different analytics to the Output view in StreamLab. Some of these produce new columns, and others change selected columns themselves.

Analytic Description
Aggregate This command lets you aggregate a column, including COUNT, SUM, and AVERAGE. You apply these over sliding or tumbling windows. Windows express an amount of rows defined by time or number of rows, over which the aggregation is applied.
Bollinger Bands This command lets you create a new column that is a running average of another column over a given window of time. You can also add Bollinger bands to this running average. Bollinger Bands are a volatility indicator, typically used for stock prices.
Calculate This command lets you perform an arithmetical calculation on a column, such as adding 2000 to the column or dividing the column in 2. Selected columns must have numerical values, such as DOUBLE.
Categorize This command lets you take a continuous value, such as a bus speed, and replace it with strings such as "fast", "medium", or "slow" by applying conditions to the column.
Distinct The Distinct operation lets you eliminate rows with duplicate columns over a specified timeframe.
Geo-Locate IP This command lets you determine the latitude and longitude associated with IP addresses worldwide.
GroupRank This command lets you group rows by the values in one column (group by) and then rank the rows within that group according to another column (rank by) across a window defined by you.
Join Join lets you combine data in the current pipeline guide with data from another stream or table. They allow you to enrich data in the current view with data from another source, as long as that source shares a column with the source for the current guide.
Time Sort This command uses a sliding time-based window of incoming rows to reorder those rows by the selected column or by ROWTIME in order to address time synchronization problems with data.
New Column The New Column analytic lets you add a new column based on a custom SQL expression.

Aggregate

The Aggregate analytic lets you make calculations on a column of numerical data, such as the sum, the average, the maximum value, the minimum value, the first value, or the last value. You can also choose statistical formulas such as population variance, population standard deviation, or sample variance. See Aggregation Operations below.

The Aggregate analytic lets you make calculations on a column of numerical data, such as the sum, the average, the maximum value, the minimum value, the first value, or the last value. You can also choose statistical formulas such as population variance, population standard deviation, or sample variance. See Aggregation Operations below.

Because StreamLab works with streaming data, you first need to designate a window across which you will perform the aggregation. Windows can be sliding or tumbling and designated as either a numeric time frame (from seconds to years) or a number of rows (last 200 rows). With sliding windows, you can apply partitions and sessions

For tumbling windows, you can also group aggregations into categories, such as departments, industries, regions, states, and so on, using the Group By option, including grouping results by multiple categories, such as region, then department (use the + to add additional grouping columns).

You can apply multiple aggregations by using the + button in the right corner of the aggregation section of the Aggregate tab.

Implementing an Aggregate Analytic

To implement a Aggregate analytic.

  1. Select a mode for the aggregation, sliding or tumbling. (See definitions below.)
  2. Select a length and unit for the aggregation window, such as "1 hour" or "15 minutes" or "200 rows."
  3. Select a column to aggregate. This column must contain numerical data.
  4. Select an operation to aggregate values for the new column. This might be an average temperature, or a running total. See Aggregation Operations below. You can add multiple analytics by using the + button.

  5. Enter a column name for the resulting column or columns.

    For sliding windows, you can also choose to group the column's results. To do so, choose a column with a limited set of values, such as region, department, product line, and so on. For time-based sliding windows, you can use time until to designate a time period that ends before the current row. See Using Time Until with Time-based Sliding Windows below. You can also choose to have the window hop forward once the stream reaches a designated time period. This means that, for example, rows will clear out and start accumulating anew once you reach the 5 minute mark for the hopping period. See Using a Hopping Period with Time-based Sliding Windows below. You can partition all sliding windows by column. See Partitioning Sliding Windows below.

    For tumbling windows (which display data at specified intervals), you need to choose what to do with other columns: either drop these (so that only aggregated columns display), use the first value in the window, or use the last value in the window. With tumbling windows, you can also choose to apply a GROUP BY using a column of limited values. See Grouping Tumbling Windows below.

  6. Click the + icon to add the command to the Guide script.

  7. The results of the script appear in the Output View window.

Using Sliding Windows

All streaming rows have a timestamp called ROWTIME. This means that you can tell StreamLab "aggregate all the rows in the past hour" or "aggregate all the rows in the past minute" and so on. Long window frames may affect performance, so experiment accordingly.

Sliding windows display data continuously. Each row contains one or more running aggregations for the specified time period. In the illustration below, avg_order and max_order are calculated using all data in the window frame.

sliding-window

Aggregation Operations

  • Average. Returns the average of all the value expressions evaluated for each row in the aggregation.
  • Count. Returns the number of qualifying rows in the aggregation
  • Count Distinct. Returns the number of qualifying non-duplicate rows in the aggregation
  • First Value. Returns the evaluation of the from the first row that qualifies for the aggregate.
  • First Value Ignore Nulls. Returns first non null value of x in
  • First Value Respect Nulls. Returns first value, including null of x in
  • Last Value. LAST_VALUE returns the evaluation of the from the last row that qualifies for the aggregate. Returns last value, including null of x in
  • Last Value Ignore Nulls. Returns last non null value of x in
  • Last Value Respect Nulls. Returns last value, including null of x in
  • Maximum. Returns the maximum value of all the value expressions evaluated for each row in the aggregation. For string values, MAX is determined by which string is last in the collating sequence.
  • Minimum. Returns the minimum value of all the value expressions evaluated for each row in the aggregation.
  • SUM returns the sum of all the value expressions evaluated for each row in the aggregation.
  • Population Variance. This is a standard statistical analytic method that returns the population variance of a non-null set of numbers (nulls being ignored). Population variance uses the following calculation: (SUM(expr*expr) - SUM(expr)\*SUM(expr) / COUNT(expr)) / COUNT(expr)
  • In other words, for non-null values in the aggregation window, and using S1 as the sum of the values and S2 as the sum of the squares of the values, Population Variance returns the result (S2-S1S1/N)/N.
  • Population Standard Deviation. This is a standard statistical analytic method that returns the square root of the population variance (defined below) of the row's value, evaluated for each row in the aggregated window.
  • Sample Variance. This is a standard statistical analytic method that returns*eturns the sample variance of a non-null set of numbers (nulls being ignored). Sample Variance uses the following calculation: (SUM(expr\*expr) - SUM(expr)\*SUM(expr) / COUNT(expr)) / (COUNT(expr)-1)
  • In other words, for a given set of non-null values, using S1 as the sum of the values and S2 as the sum of the squares of the values, VAR_POP returns the result (S2-S1*S1/N)/(N-1).
  • Sample Standard Deviation. This is a standard statistical analytic method that returns the statistical standard deviation of all values in number-expression, evaluated for each row remaining in the group and defined as the square root of the sample variance (defined above).

Using Time-based Sliding Windows

Time-based sliding windows identify groups of rows based on a specified time period in relation to the current row.

You define time-based windows in terms of 1) a time when the window begins. 2) a time when the window ends. 3) whether or not the window should "hop." Hopping windows accumulate data in a specified time period (1 minute, 5 minutes, 1 hour) and then reset once the time period has elapsed.

Sliding windows emit one row for every row that enters the query (through a simple INSERT, a pump or an external source accessed by a foreign stream). The window frame of a sliding window moves along one row at a time. Rows exit the window as they move out of the defined frame. If you define a window as beginning five minutes ago, rows move out of the window once their timestamp is greater than 5 minutes past the current row. If you define a window as 200 rows, rows move out of the window once the 201st row enters the window.

sl_aggregate-sliding-time

Using Time Until with Time-based Sliding Windows

By default, windows have an upper bound of the current row. For some calculations, you will want to exclude the current row by designating a window that starts before the current row. Otherwise, a given calculation may miss additional rows that come in with the same ROWTIME as the current row, since StreamLab begins emitting output rows as soon as the current row hits its system. Using time untilensures that you have captured all rows with the same ROWTIIME in a calculation.

However, when you define windows this way, youintroduce some latency. In cases where it is not important to calculate rows with the same rowtime, you can use the default upper bound.

Windows that use time until let you exclude the current row from calculations such as AVG or MAX. This makes statistical tools like Bollinger bands or the naive bayes classifier statistically valid.

In the illustration below, values for the row with a rowtime of '2019-03-30 03:05:30.0' include rows fro. '2019-03-30 03:00:30.0' t. '2019-03-30 03:04:30.0', but do not include the current row. If no data is available for the window, as in the first row, values return null.

offset-window

Using a Hopping Period with Time-based Sliding Windows

When you designate a hopping period, the window will slide forward at the top of every the designated hopping periodROWTIMEs in the 3:00 hour will accumulate in the window until a row arrives with a ROWTIME of 4:00 or later. At that point, the window starts over; the first calculation will include only one row, the next two rows, and so on, until a row arrives with a ROWTIME of 5:00 or later. When you use a hopping window with COUNT, for example, counts will go up for the entire hour, then start at zero at the beginning of the new hour. Data from the previous hour is "dropped" from aggregation.

Hopping windows are particularly useful for aggregations like AVG, VARIANCE, STDDEV on large time windows (such as hours, days, or weeks) on high datarate streams.

Hopping windows have significant benefits.

  • They reduce the amount of system resources required to maintain the window, since the window only needs to maintain aggregated history for each time interval (1 hour in the example above).
  • When used with an offset, such as RANGE BETWEEN INTERVAL '60' MINUTE PRECEDING AND INTERVAL '1' MINUTE PRECEDING, the results for each input row are computed on a window that excludes data from the current row. This is highly desirable since any anomaly in streaming data does not adversely impact computation of Bollinger bands or Naive Bayes classifiers. We recommend using hopping windows with an offset. The offset makes both ends of a window frame hop forward in a 1 minute interval. That is, as each row arrives for the current minute interval, it does not contribute to the aggregation results for that row since its not in the window frame for the current row.
  • When used without an offset, results for hopping windows are computed for each input row with zero latency.

The illustration below shows the results of a query run on the view defined above. In the illustration, calculations for the first 9 rows include rows from '2019-03-30 03:02:00.0' to '2019-03-30 03:59:45.0'. Once a row arrives later than '2019-03-30 03:59:59.999', calculations reset to newly arriving rows. In the last 5 rows, calculations include rows from '2019-03-30 04:02:00.0' to '2019-03-30 04:18:00.0'.

hopping-window

Using Row-based Sliding Windows

Row-based sliding windows identify groups of rows based on N number of rows in relation to the current row.

You define row-based windows in terms of

  1. when the window begins, either unbounded (all rows received up until the current row) or a specified number of rows (such as 200 rows).
  2. when the window ends, either the current row or N rows preceding (such as "until 10 rows ago"). Row-based windows are defined as the current row plus X number of rows preceding. A window such ROWS '3' PRECEDING will actually contain 4 rows: 3 preceding rows plus the current row.
sl_aggregate-sliding-row

In the illustration below, for the first row, only that row is included in the window. The 6th row calculates values using data from rows 1, 2, 3, 4, 5, and 6. The 10th row calculates values using data from rows 5, 6, 7, 8, 9, and 10.

rows-based-window

Partitioning Sliding Windows

Both time-based and row-based sliding windows can be partitioned by one or more columns. (You cannot use partitions with tumbling windows.)

When you apply partitioning, calculations incorporate both the window frame and the partitioned column value. This means that for an aggregation such as average, the calculated average will include values for all rows that 1) arrive within the window frame and 2) contain the same value in the partition column.

For example, if the partition column contains stock tickers, and you have chosen average for an aggregation, the average will be calculated over all rows that contain, for example, ORCL, and fall within the window frame.

To partition a window:

  1. Select a column to partition under Partition By. You can add additional partitions by clicking the + to the left of Partition By
  2. Once you have finished configuring the partition, click + in the Suggestions window to add the aggregation to the Script.

In the illustration, all rows with the value 'IBM' for the column ticker that have been received up until this point and fall within the window frame are calculated together.

partitioned-window

Using Sessions with Sliding Windows

(new in StreamLab version 2.3.1)

You can also define sessions for a column or columns. You do not need to choose a column to partition in order to use sessions. (You cannot use sessions with tumbling windows.)

With sessions, you can determine the location and size of a partitioned window using the input data itself. For example, you might have a session applied to a customer's id, which you could use to develop aggregates for the customer's orders over the past day.

Sessions can start and end in one of two ways:

  • You can set Boolean conditions for when the session starts (START WHEN) and ends (END WHEN).
  • You can set a time out for the session's expiration.

You can use these flags together.

Note: Sessions are not currently supported for windows defined with GROUP BY.

To use sessions:

  1. Under SESSION, choose yes.
  2. Indicate a time frame for the session. You can do so using Boolean conditions for start and end, a timeout period, or both.
    • If you wish to set a condition for when the session starts, choose yes under START WHEN. Next, using the leftmost control under START WHEN, select the column on which the condition will apply. Next, define the condition for when the session starts. For a simple flag in the data, for example, you might select is equal to and enter "START" (or whatever flag is contained in your data).
    • If you wish to set a condition for when the session ends, choose yes under END WHEN. Next, using the leftmost control under END WHEN, select the column on which the condition will apply. Next, define the condition for when the session ends. For a simple flag in the data, for example, you might select is equal to and enter "END" (or whatever flag is contained in your data).
    • If you wish to set a time out for the session's expiration, select yes under timeout?. Next, enter a number and select a unit for the timeout. For example, if you wish the session to time out after no rows have arrived in the session for 2 hours, you would enter "2" and select hours. If rows stop arriving for 2 hours, the session ends, and a new session begins.
  3. Once you have finished configuring the session, click + in the Suggestions window to add the aggregation to the Script.

Using Tumbling Windows

This data window also uses ROWTIME, and lets you tell StreamLab. "aggregate data for an hour, then clear data, then give me data for the next hour," at, for example, 2:00 PM, 3:00 PM, 4:00 PM, and so on.

Tumbling windows display data only at specified intervals. That is, if you choose 5 seconds, a new row appears every five seconds with aggregated data. For non-aggregated rows (the rows that are, for example, averaged for the aggregation), you can choose to either drop non-aggregated columns, display the first value for non-aggregated columns, or display the last value.

With tumbling windows, you can also choose to use a group by, drawing on a column with limited values, such as "dept A, dept B, dept C." When you apply a group by, StreamLab emits one row for each group that appears window, that is, every 10 seconds a row for dept A, B, and C, provided that each group has appeared in the window.

Options for Tumbling Windows

Tumbling windows emit one row for every group of rows aggregated. As with sliding windows, a calculation such as sum or average incorporates all the rows in the window, but a row will appear only once during the window frame. So if you designate a window of 200 rows, one row will appear for every 200 rows that enter StreamLab. Similarly, if you designate a window of 5 minutes, one row will appear for all rows that enter StreamLab during a 5 minute interval.

ROWTIME AMOUNT TICKER
2019-03-30 03:02:00.000 20 ORC.
2019-03-30 03:03:00.000 30 IB.
2019-03-30 03:04:00.000 100 ORC.
2019-03-30 03:04:30.000 40 IB.
2019-03-30 03:05:00.000 100 MSF.
2019-03-30 03:05:30.000 50 MSF.
2019-03-30 04:04:00.000 100 GOOG.
2019-03-30 04:06:00.000 100 ORC.
2019-03-30 04:18:00.000 1000 GOGG.
2019-03-30 04:44:00.000 1000 ORC.
sl_aggregate-tumbling

Grouping Tumbling Windows

With tumbling windows, you can apply a group by, drawing on a column with limited values. When you apply a group by, StreamLab emits one row for each group that appears in the windowed time period. This allows you to display aggregated data for each department, or region, or building, and so on.

In the screen shot below, for example, StreamLab emits one aggregated row every ten seconds for trip_no 500s and 600s. During the first window, no rows with 600s appeared, so StreamLab only emits an aggregated row for 500s.

sl_aggregate_group_by_result

Using Group By with Tumbling Windows

With tumbling windows, you can apply a group by, drawing on a column with limited values. When you apply a group by, StreamLab emits one row for each group that appears in the windowed time period. This allows you to display aggregated data for each department, or region, or building, and so on.

In the screen shot below, for example, StreamLab emits one aggregated row every ten seconds for trip_no 500s and 600s. During the first window, no rows with 600s appeared, so StreamLab only emits an aggregated row for 500s.

Calculate

The Calculate command lets you perform an arithmetical calculation on a column. Use @ for the value of the column. Selected columns must have numerical values, such as DOUBLE. Calculations might be something like the following:

Calculation Result
@+2000 Adds 2000 to the column value.
@-800 Subtracts 800 from the column value.
@/2 Divides the column value by 2.
@*4 Multiplies the column value by 4.

To implement a Calculate command:

  • Select column(s) to which to apply categories. Each column must contain numerical data.
  • Enter an arithmetical calculation for the column.
  • Click the + icon  in the Suggestions list to add the command to the Guide script.
  • The results of the script appear in the Output View window.

Categorize

The Categorize command lets you take a continuous value, such as a bus speed, and replace it with strings such as "fast", "medium", or "slow" by applying conditions to the column. You can process more than one column at once. Because this command replaces the column with one containing strings, you may want to make a copy of the column first, using the New Column command.

To implement a Categorize command:

  1. Select column(s) to which to apply categories. Each column must contain numerical data.
  2. Enter a default category for column values that do not meet any conditions, such as "Normal."
  3. Enter a category for columns that meet the first condition, such as "High."
  4. Select a comparison operator for the condition, such as "is equal to."
  5. Enter a value for the condition, such as "110."

To add more conditions, click the + icon in the Category window.

  • Click the + icon  in the Suggestions list to add the command to the Guide script.
  • The results of the script appear in the Output View window.

Distinct

The Distinct operation lets you eliminate rows with duplicate columns over a specified timeframe.

Choosing Promote to ROWTIME? will produce an output view showing only rows with distinct columns. You can also choose to not promote the new timestamp to ROWTIME, which will allow you to see the new timestamp for which rows are evaluated.

Note: If you do not promote to ROWTIME, new timestamps will display as one millisecond behind the row's rowtime.

Implementing Distinct

To implement Distinct.

  1. Select a number of milliseconds over which duplicate rows will be eliminated.
  2. Choose whether or not to promote the new timestamp to ROWTIME.
  3. Enter a name for the column with distinct timestamps.
  4. Click the + icon to add the command to the Guide script.
  5. The results of the script appear in the Output View window.
sl_distinct

For example, if you chose to eliminate duplicate rows over a 1 minute timeframe, and StreamLab received the data below, the last 5 rows would be eliminated because they are duplicates.

2019-03-30 04:18:00.000 GOOGL 100
2019-03-30 04:18:00.000 GOOGL 100
2019-03-30 04:18:00.000 IBM 15
2019-03-30 04:43:00.000 IBM 60
2019-03-30 04:44:00.000 ORCL 1000
2019-03-30 04:46:00.000 ORCL 3000
2019-03-30 05:03:00.000 IBM 30
2019-03-30 05:03:01.000 IBM 30
2019-03-30 05:03:02.000 IBM 30
2019-03-30 05:03:03.000 IBM 30
2019-03-30 05:03:04.000 IBM 30
2019-03-30 05:03:05.000 IBM 30

Bollinger Bands

The Bollinger Bands analytic command lets you create a new column that is a running average of another column over a given window of time. This analytic works especially well for streaming data. The diagram below depicts the creation of a running average on a stock's closing price.

You can also add Bollinger bands to this running average. Bollinger Bands are a volatility indicator, typically used for stock prices.

Bollinger Bands consist of:

  • an moving average (avg) over a given period T.
  • an upper band at K times an T-period standard deviation above the moving average (avg + Kσ)
  • a lower band at K times an T-period standard deviation below the moving average (avg − Kσ)

StreamLab also lets you define inner and outer values for the upper and lower bands.

Time Sort

The Time Sort uses a sliding time-based window of incoming rows to reorder those rows by the selected column or by ROWTIME in order to address time synchronization problems with data. For example, sometimes data arriving from multiple sources may not be time-synchronized. Such problems can produce unreliable results from analytics, which typically depend on arriving data being ordered by ROWTIME. You will likely only need to use this command if you are seeing unreliable results in your data that are time-related.

In the Time Sort command, you can also choose to promote a selected column to rowtime. You do so in cases where you know that another column contains a timestamp more relevant than the time the row entered s-Server. This might be, for example, the time an earthquake actually occurred instead of the time the row enters s-Server. These values can have slight lags, making data calculations less reliable.

To implement a Time Sort command:

  1. Select a column containing a time stamp.
  2. Enter a window length and unit on which to sort rows.
  3. Choose whether or not to promote the row to ROWTIME.
  4. Click the + icon  in the Suggestions list to add the command to the Guide script.
  5. The results of the script appear in the Output View window.

Window

The Window command lets you perform a windowed aggregation on a selected column. The command produces a new column with the result of the operation. To implement a Window command:

  • Enter a column name for the resulting column.
  • Select an operation to aggregate values for the new columns. This might be an average temperature, or a running total. See - Operations Used in Commands* for more details on these operations.
  • Select a column on which to perform the aggregation. This column must contain numerical data.
  • Select a length and unit for the aggregation window, such as "1 hour" or "15 minutes."
  • Click the + icon to add the command to the Guide script.
  • The results of the script appear in the Output View window.

Geo-LocateIP

The Geo-Locate IP command lets you determine the latitude and longitude associated with IP addresses worldwide.

To use the Geo-Locate IP command:

  1. Select a column that contains IP addresses in dotted-decimal format.
  2. Enter a prefix for the new latitude and longitude columns.
  3. A suggestion appears. Click the + icon to add the command to the Guide script.
  4. The results of the script appear in the Output View window.

The Geo-Locate IP command uses the MaxMind GeoIp City database. See http://www.maxmind.com/en/city for more details on the MaxMind GeoIp City database.

Note: s-Server ships with a free version of this database. This topic describes how the plugin finds the MaxMind data file and how one should use the licensed version for production. You can use this free default database for testing, but for production use, you should use at least an up-to-date GeoLiteCity.dat, or probably a licensed data file.

GroupRank

GroupRank lets you group rows by the values in one column (group by) and then rank the rows within that group according to another column (rank by) across a window defined by you. To implement a GroupRank operation:

  • Enter a name for the column that will display length
  • Select a column that StreamLab will use to group rows over the - group window*.
  • Select a column that StreamLab will use to rank rows within the group defined above.
  • Select an operation to aggregate values within the group. See - Operations Used in Commands* for more details on these operations.
  • Select a length and unit for the group window, such as "1 hour" or "15 minutes."
  • Chose to sort group rows in either ascending or descending order.
  • Choose to order output rows in either ascending or descending order.
  • Choose rolling or tumbling window type. See below for more information on these.
  • If desired, define a max idle time for holding a group for ranking. When maxIdle expires the current group is released to the stream. A value of zero indicates no idle time out.
  • If desired, indicate the maximum number of rows the UDX will output in a given group. A value of 0 indicates no limit.
  • Click the + icon to add the command to the Guide script.
  • The results of the script appear in the Output View window.

Join

Join lets you combine data in the current pipeline guide with data from another stream or table. They allow you to enrich data in the current view with data from another source, as long as that source shares a column with the source for the current guide.

For example, if your guide has columns of data on bus positions and speed, and one of the columns contains a driver identification number, you might join the guide with a database table that has information on the driver's names matched to their identification numbers. By doing so, you can enrich your guide by adding columns from the DRIVERS table.

StreamLab supports stream/stream, stream/table, table/stream and table/table joins. These are implemented using the SQL JOIN clause. It also offers a mode called lookup, which provides some control over whether and how records in the lookup table are cached. The lookup mode is implemented using the TableLookup UDX.

StreamLab supports four types of joins:

Join type Explanation
INNER JOIN Returns all pairs of rows from left and right for which the join condition evaluates to TRUE.
LEFT JOIN As INNER JOIN, but rows from the left are kept even if they do not match any rows on the right; NULL values are generated on the right.
RIGHT JOIN As INNER JOIN, but rows from the right are kept even if they do not match any rows on the left; NULL values are generated on the left for these rows.
FULL JOIN As INNER JOIN, but rows from both sides are kept even if they do not match any rows on the other side; NULL values are generated on the other side for these rows.

Implementing a Join

To implement a Join analytic.

  1. Choose whether to join to a stream, a table, or to do a lookup table. Only streams and tables that have been added to StreamLab will be available here. Do this first, because your options will change depending on your choice.

    sl_join1
  2. Next, choose whether to conduct the join over a window. You'll get better results if you do. You can configure windows on both the left (the pipeline guide input) and the right (the stream/table to be joined).

  3. For each window you've chosen:

    • Choose whether to create a time-based or row based window.

    • Enter values for the window's beginning and end, such as "10 minutes to 0 minutes ago" or "1 hour to 10 minutes ago" or "100 rows to 20 rows ago."

​ The window types and lengths for the left window and the right window do not need to match.

  1. Choose a join type: inner, left, right, or full. For lookup, you do not need to choose a join type.

  2. Choose a stream or table to join to. The selection will change depending on what you chose in step 1. For stream you see a list of streams, for table a list of tables, and for lookup a list of external tables.

sl_join2
  1. Finally, choose a column from the current stream and the stream/table to be joined. These must match in order for the join to work. You can match on more than one pair of columns (up to three) by clicking + in the top right of the parameter box.

  2. Click the + icon to add the command to the Guide script.

  3. The results of the script appear in the Output View window.

When you define a window using rows, you are defining a ROWS BETWEEN ROWS AND ROWS PRECEDING. If rows until is 0, the clause will be ROWS PRECEDING and all rows up to the current row will be included.

When you define a window using time, you will see time from and time until parameters; you are defining a RANGE BETWEEN INTERVAL

Each match is treated as an equi-join, whereby pipeline guide column = join to column.

Lookup supports a maximum of three predicates

To learn more about windows in streaming joins, see the topic WINDOW clause in the s-Server Streaming SQL Reference Guide.

JOIN to a Stream

The join to can be an SQLstream Stream, or an External Stream that has been created with a Render Mode of either Standard: FS->pump->stream or Directly as stream (implemented as a FOREIGN STREAM)

SQLstream Streams can be defined from the Sources (either creating them individually or using Import SQLstream Schema)

It is legitimate to join a stream to itself (especially when using windowing).

JOIN to a Table

The join to can be an External Table, or an External Stream that has been created with a Render Mode of Directly as table (which is implemented as a FOREIGN TABLE).

It is legitimate to join a table to itself. For example you might want to join SALES.EMPS to itself to get information about the employee's manager.

Note: the entire table will be read into memory when the guide starts running; this means that:

  1. the table must fit into memory
  2. changes to the data in the table after the guide starts running are not reflected in the resulting query output

New Column

The New Column analytic lets you add a new column based on a custom SQL expression.

To add a new

To implement a New Column command:

  1. Enter a name for the column.
  2. Enter a SQL expression for the column.
  3. Click the + icon to add the command to the Guide script.
  4. The new column appears in the Output View window.

The New Column analytic lets you add a column to the guide using a SQL expression. Using this feature requires both knowledge of SQL and knowledge of s-Server's current contents (in that you will need to work with streams and tables in s-Server). For more information on s-Server's streaming SQL, see SQLstream SQL Reference Guide.

Advanced Analytics

The following analytics are provided in order to provide access to advanced s-Server features. Using them requires advanced knowledge of SQL.

  • Call UDX This command lets you enter the name of a predefined custom user transform. A user-defined transform is a user-defined functionthat returns a table or a [stream]/glossary/stream/). Its input arguments can be scalars or cursors.
  • Custom View This comand lets you enter SQL that will be used to transform the stream you are building in the Guide.
  • Inject SQL This command lets you inject any block of SQL into s-Server through StreamLab.

Call UDX

This command lets you enter the name of a predefined custom user transform. A user-defined transform is a user-defined functionthat returns a table or a stream. Its input arguments can be scalars or cursors. A cursor is an object representing a subquery, which the user-defined transform can use to read the subquery results. The subquery can be relational or streaming.

The UDX does have to be predefined and it cannot take arguments. It can return either a table or a stream. If it is a table you can specify a column that will contain the ROWTIME.

Custom View

The SQL you enter here will be used to transform the stream you are building in the Guide. The actual SQL added to the script will be in the following form:

   CREATE VIEW (output) AS SELECT (stream) <<your SQL here>> FROM (input)

(input), (output) and (stream) are all filled in by StreamLab.

Inject SQL

The Inject SQL analytic lets you inject any block of SQL into s-Server.