Pipeline pipeline guides are collections of commands, suggestions, and scripts that let you CREATE VIEWs on data sources. These views are composed of SQL; you use the Pipeline Guide interface to generate this SQL. You use pipeline guides to prep sources for dashboards. You can also use pipeline guides to run analytics on streaming sources.
For example, you might have a guide that extracted the year (often the first four characters) from a timestamp column and added these to a new column called YEAR. Other guides might split a column, rename it, merge two, and so on. You might also have a guide that adds a running average column.
This section contains the following topics:
This topic contains the following subtopics:
Commands are sets of operations that you can perform on the pipeline guide’s data source. These include, for example, commands to parse the source as a W3C log, to parse a timestamp, to split a column at a given character, to remove a column, and to rename a column. Commands are grouped by functionality, and you can switch between command sets by clicking the Select Command Set button in the top left hand corner of the pipeline guide.
Suggestions appear in the Suggestions list in the middle left of the Guide interface. Suggestions change depending on your data source and your selection in the Output view.
As you add suggestions to the script, these suggestions are implemented as SQL, with changes visible in the Output view. You can remove items from the script by clicking the - button. You can also visualize changes by clicking the View Dashboard button (/images/sl/sl_guide_view_dashboard_button.png).
Guides are collections of scripts that let you manipulate SQL objects in StreamLab
To add a pipeline guide:
To open a pipeline guide, you click it. The Pipeline Guide page opens.
Here, you:
Once you create a pipeline guide and select a source for the pipeline guide, you can begin adding steps to its script. To do so:
Let’s say you start with a simple log file with a stock ticker feed with date, close, volume, open, high, and low:
"2013/05/28","881.2700","2257410.0000","883.5000","892.1400","880.4000"
The log file initially appears in the Output view with two columns, one for rowtime and one column called MESSAGE that contains all the values separated by commas:
Your first step here is to separate out the values.
At this point you can simply click the + to the right of the first suggestion, "Split column MESSAGE using the automatic pattern Comma-Separated Values (CSV)".
As rows stream into StreamLab, a piece of software called the scrutinizer continually checks these rows for patterns, offering suggestions for command that you can apply to the source.
To view details and suggestions on a column, mouse over the column’s heading. StreamLab offers notes and suggestions about the column. For example, StreamLab notes that the column below might contain hostnames or IP addresses, or longitudes, or bearings. These suggestions can help you identify a column’s contents.
Columns are coded according to the following color scheme:
For example, in the screen grab below, columns *ROWTIME *and when are time columns, columns *id *and title are text columns, and columns magnitude, latitude, and longitude are type numerical.
You can view the input (original) stream from the source by dragging the blue bar at the top of the Output view window.
While StreamLab does calculations on all rows in a source, it only displays a subset of rows in the Output View. This is because sources can have massive amounts of rows, and these would flow by too fast to view meaningfully. The Output View header bar displays statistics for both actual rows and rows displayed.
The Output View displays the results of the current Script. When you initially open a pipeline guide, the Output View displays the raw information of the source. (Because sources may have a large, fast-moving set of rows, the Output View displays a representative sample of the source.)
Once you implement Script items, the Output View changes to display the results of the Script.
StreamLab displays information on the source in two ways:
You can view the input (original) stream from the source by dragging the blue bar at the top of the Output view window.
While StreamLab does calculations on all rows in a source, it only displays a subset of rows in the Output View. This is because sources can have massive amounts of rows, and these would flow by too fast to view meaningfully. The Output View header bar displays statistics for both actual rows and rows displayed.
The number of rows processed by the StreamLab Scrutinizer appear to the right of the Output View name.
The percentage of rows displayed in Output view appears to the right of the rows per second number.
Suggestions appear in the Suggestions list in the middle left of the Pipeline Guide interface. Each suggestion offers detail on the suggestion’s action. Suggestions change depending on your data source and your selection in the Output view. Every command that you implement appears as a suggestion. To implement the command, you click the + icon to the right of the suggestion.
Suggestions also change depending on the column you have selected in Output view. To get suggestions for a column, select it.
You can make cells active in Output View by double-clicking them. Once cells are active, you can select within the cell. When you select characters in a cell, StreamLab will automatically fill in command fields, and also change suggestions, both based on your selection.
As you add commands to guides, the guides generate SQL. When you execute this SQL, the view is created or modified in your selected schema with the changes shown in the Output View window. To view or export SQL, click the View Log button in the upper right corner of the Pipeline Guides page:
The Log window opens, listing all the SQL that you have generated . You can also run the SQL again by clicking the Execute button.
In the Guide interface, you can switch sources. Doing so runs the pipeline guide script on another source.
To switch sources, click the Source button on the top right of the guide page under the colored icons:
This brings up the source selection page, with all sources and potential sources shown. You can select a different source - but if you have already created steps in the guide, you should make sure that the new source contains all the same column names that the pipeline depends on.