The Distinct operation lets you eliminate rows with duplicate columns over a specified timeframe.
Choosing Promote to ROWTIME? will produce an output view showing only rows with distinct columns. You can also choose to not promote the new timestamp to ROWTIME, which will allow you to see the new timestamp for which rows are evaluated.
Note: If you do not promote to ROWTIME, new timestamps will display as one millisecond behind the row’s rowtime.
To implement Distinct.
For example, if you chose to eliminate duplicate rows over a 1 minute timeframe, and StreamLab received the data below, the last 5 rows would be eliminated because they are duplicates.
2019-03-30 04:18:00.000 | GOOGL | 100 |
---|---|---|
2019-03-30 04:18:00.000 | GOOGL | 100 |
2019-03-30 04:18:00.000 | IBM | 15 |
2019-03-30 04:43:00.000 | IBM | 60 |
2019-03-30 04:44:00.000 | ORCL | 1000 |
2019-03-30 04:46:00.000 | ORCL | 3000 |
2019-03-30 05:03:00.000 | IBM | 30 |
2019-03-30 05:03:01.000 | IBM | 30 |
2019-03-30 05:03:02.000 | IBM | 30 |
2019-03-30 05:03:03.000 | IBM | 30 |
2019-03-30 05:03:04.000 | IBM | 30 |
2019-03-30 05:03:05.000 | IBM | 30 |