The StreamLab scrutinizer uses a series of Java jars that run on the s-Server WebAgent. You can customize the Scrutinizer by writing your own recognizers and split counters. You write these as Java interfaces, extending or implementing one of two Java interfaces:
SplitCounter. Split counters will try and split a cell value and return a count of how many times it can be split. Recognizer. Recognizers are used to identify different kinds of formats, values and objects of a particular cell in a row.
The methods for both are spelled out in the Scrutinizer Javadoc, located at /scrutinizer_api/index.html
When WebAgent runs, it checks recognizer.properties and splitcounters.properties, then tries to load classes listed in these files into the instance.
To load a new jar into web agent, you need to:
webagent.sh -cp newSplitter.jar:newRecognizer.jar
Once you do so, StreamLab will begin scrutinizing files using the new scrutinizers.
We provide two examples below, one of a comma-based splitter (as well as BaseSplitCounter) and one of a recognizer for latitudes.
The example below counts how many times a column can be split on commas. package com.sqlstream.WebAgent.scrutinizer.autosplitter; public class CommaSVSplitCounter extends BaseSplitCounter {
@Override
public int count() {
return xSVcount(this.cell, '.');
}
@Override
public String getShortName() {
return "csv";
}
@Override
public String getLongName() {
return "Comma-Separated Values (CSV)";
}
@Override
public String getType() {
return "vclp";
}
@Override
public String getTextTrue() {
return "CSV (comma-separated values)";
}
@Override
public String getSeparatorSQL() {
return "','";
}
@Override
public String getEscapeSQL() {
return "u&'\\005C'";
}
@Override
public String getQuoteSQL() {
return "'\"'";
}
}
#### BaseSplitter
The following is the code for BaseSplitter.
package com.sqlstream.WebAgent.scrutinizer.autosplitter;
/*
* BaseSplitCounter works as a helper class that provides support for future
* splitcounters.
*
*/
public abstract class BaseSplitCounter implements SplitCounter {
protected String cell;
public void _prepare(String cell) {
this.cell = cell;
}
public void _clear() {
this.cell = null;
}
protected int xSVcount(String str, char delim) {
int n = str.length();
if(n < 1) return 0;
int count = 0;
char quote = '"';
char esc = '\\';
boolean inQuote = false;
boolean inEscape = false;
for(int i=0; i < n; i++) {
char c = str.charAt(i);
if(inEscape)
inEscape = false;
else if(inQuote) {
if(c == quote)
inQuote = false;
} else {
if(c == esc)
inEscape = true;
else if(c == quote)
inQuote = true;
else if(c == delim)
count++;
}
}
return count+1;
}
}
The code below checks columns for a latitude pattern. package com.sqlstream.WebAgent.scrutinizer.recognizer; public class LatitudeRecognizer extends NumberRecognizer {
@Override
public boolean test() {
return this.getFloat() != null && this.getFloat() >= -90
&& this.getFloat() <= 90;
}
@Override
public String getShortName() {
return "latitude";
}
@Override
public String getLongName() {
return "Column contains latitudes";
}
@Override
public String getTextTrue() {
return "contains latitudes";
}
}