Distinct

The Distinct operation lets you eliminate rows with duplicate columns over a specified timeframe.

Choosing Promote to ROWTIME? will produce an output view showing only rows with distinct columns. You can also choose to not promote the new timestamp to ROWTIME, which will allow you to see the new timestamp for which rows are evaluated.

Note: If you do not promote to ROWTIME, new timestamps will display as one millisecond behind the row’s rowtime.

Implementing Distinct

To implement Distinct.

  1. Select a number of milliseconds over which duplicate rows will be eliminated.
  2. Choose whether or not to promote the new timestamp to ROWTIME.
  3. Enter a name for the column with distinct timestamps.
  4. Click the + icon to add the command to the Guide script.
  5. The results of the script appear in the Output View window.
sl_distinct

For example, if you chose to eliminate duplicate rows over a 1 minute timeframe, and StreamLab received the data below, the last 5 rows would be eliminated because they are duplicates.

2019-03-30 04:18:00.000 GOOGL 100
2019-03-30 04:18:00.000 GOOGL 100
2019-03-30 04:18:00.000 IBM 15
2019-03-30 04:43:00.000 IBM 60
2019-03-30 04:44:00.000 ORCL 1000
2019-03-30 04:46:00.000 ORCL 3000
2019-03-30 05:03:00.000 IBM 30
2019-03-30 05:03:01.000 IBM 30
2019-03-30 05:03:02.000 IBM 30
2019-03-30 05:03:03.000 IBM 30
2019-03-30 05:03:04.000 IBM 30
2019-03-30 05:03:05.000 IBM 30