Managing the Scrutinizer

The StreamLab Scrutinizer watches data as it streams in, making guesses about this data that it uses to populate the Suggestions list. For example, it guesses a column’s type, how the column is divided (such as by commas or tabs), and which columns might contain partition keys.

Sometimes, you may want to turn off the scrutinizer in order to improve performance. To do so, click Scrutinized in the bar above the Output View:

Adjusting Other Scrutinizer Settings

One of the things that the Scrutinizer tries to determine is whether a column contains a partition key, a limiited set of values that can be used to break up the data. This affects what suggestions show up in the pipeline guide.

For example, in a web log, most columns will vary greatly–the IP address of an HTTP request, for example, But the a column containing the name of the browser will be limited to, for example, Safari, Chrome, Firefox, Internet Explorer, Opera. The scrutinizer will identify this column as a potential partition key.

You can adjust what columns are identified as partition keys by changing two columns in the Project Settings dialog box, Partition Key Unique Limit and Partition Key Length Limit. This value determines how limited a column needs to be–how many different values–in order to be identified as a partition key. If the Scrutinizer sees more than Partition Key Unique Limit it assumes the column probably isn’t a partition key.

Similarly, if the Scrutinizer sees a string value longer than Partition Key Unique Limit, it assumes the column probably isn’t a partition key. You can adjust these two values to shape which columns end up being marked.