Join lets you combine data in the current pipeline guide with data from another stream or table. They allow you to enrich data in the current view with data from another source, as long as that source shares a column with the source for the current guide.
For example, if your guide has columns of data on bus positions and speed, and one of the columns contains a driver identification number, you might join the guide with a database table that has information on the driver’s names matched to their identification numbers. By doing so, you can enrich your guide by adding columns from the DRIVERS table.
StreamLab supports stream/stream, stream/table, table/stream and table/table joins. These are implemented using the SQL JOIN clause. It also offers a mode called lookup, which provides some control over whether and how records in the lookup table are cached. The lookup mode is implemented using the TableLookup UDX.
StreamLab supports four types of joins:
|INNER JOIN||Returns all pairs of rows from left and right for which the join condition evaluates to TRUE.|
|LEFT JOIN||As INNER JOIN, but rows from the left are kept even if they do not match any rows on the right; NULL values are generated on the right.|
|RIGHT JOIN||As INNER JOIN, but rows from the right are kept even if they do not match any rows on the left; NULL values are generated on the left for these rows.|
|FULL JOIN||As INNER JOIN, but rows from both sides are kept even if they do not match any rows on the other side; NULL values are generated on the other side for these rows.|
To implement a Join analytic.
Choose whether to join to a stream, a table, or to do a lookup table. Only streams and tables that have been added to StreamLab will be available here. Do this first, because your options will change depending on your choice.
Next, choose whether to conduct the join over a window. You’ll get better results if you do. You can configure windows on both the left (the pipeline guide input) and the right (the stream/table to be joined).
For each window you’ve chosen:
Choose whether to create a time-based or row based window.
Enter values for the window’s beginning and end, such as “10 minutes to 0 minutes ago” or “1 hour to 10 minutes ago” or “100 rows to 20 rows ago.”
The window types and lengths for the left window and the right window do not need to match.
Choose a join type: inner, left, right, or full. For lookup, you do not need to choose a join type.
Choose a stream or table to join to. The selection will change depending on what you chose in step 1. For stream you see a list of streams, for table a list of tables, and for lookup a list of external tables.
Finally, choose a column from the current stream and the stream/table to be joined. These must match in order for the join to work. You can match on more than one pair of columns (up to three) by clicking + in the top right of the parameter box.
Click the + icon to add the command to the Guide script.
The results of the script appear in the Output View window.
When you define a window using rows, you are defining a ROWS BETWEEN
When you define a window using time, you will see time from and time until parameters; you are defining a RANGE BETWEEN INTERVAL
Each match is treated as an equi-join, whereby pipeline guide column = join to column.
Lookup supports a maximum of three predicates
To learn more about windows in streaming joins, see the topic WINDOW clause in the s-Server Streaming SQL Reference Guide.
The join to can be an SQLstream Stream, or an External Stream that has been created with a Render Mode of either Standard: FS->pump->stream or Directly as stream (implemented as a FOREIGN STREAM)
SQLstream Streams can be defined from the Sources (either creating them individually or using Import SQLstream Schema)
It is legitimate to join a stream to itself (especially when using windowing).
The join to can be an External Table, or an External Stream that has been created with a Render Mode of Directly as table (which is implemented as a FOREIGN TABLE).
It is legitimate to join a table to itself. For example you might want to join SALES.EMPS to itself to get information about the employee’s manager.
Note: the entire table will be read into memory when the guide starts running; this means that:
the table must fit into memory
changes to the data in the table after the guide starts running are not reflected in the resulting query output
To use table lookup, you first need to create an external connection. See the topic StreamLab External Connections Overview in this guide for more details. You then need to add one of the tables from the external connection as a source. See the topic Adding an External Table Source in this guide for more details.
Once you have an external table added as a source, it will be available in the table menu of the Table Lookup command.
If you know the number of rows contained in the external table, you can speed processing by entering a cache size and clicking the Prefetch Rows option.
You can use a feature called fuzzy lookup if your column uses numbers, and if these numbers only need to be close. Working with latitudes and longitudes is good use case for this setting, since you will not necessarily get an equals match when joining on latitude or longitude. In this case, with fuzzy set to true, numbers that are close to each other will match.