Customizing the Scrutinizer

The StreamLab scrutinizer uses a series of Java jars that run on the s-Server WebAgent. You can customize the Scrutinizer by writing your own recognizers and split counters. You write these as Java interfaces, extending or implementing one of two Java interfaces:

SplitCounter. Split counters will try and split a cell value and return a count of how many times it can be split. Recognizer. Recognizers are used to identify different kinds of formats, values and objects of a particular cell in a row.

The methods for both are spelled out in the Scrutinizer Javadoc, located at /scrutinizer_api/index.html

Loading Jars into WebAgent

When WebAgent runs, it checks recognizer.properties and splitcounters.properties, then tries to load classes listed in these files into the instance.

To load a new jar into web agent, you need to:

  • Add the jar to recognizer.properties or splitcounters.properties.
  • Add jars to WebAgent’s classpath. You do so using the -cp command, along the following lines:
webagent.sh -cp newSplitter.jar:newRecognizer.jar

Once you do so, StreamLab will begin scrutinizing files using the new scrutinizers.

Examples

We provide two examples below, one of a comma-based splitter (as well as BaseSplitCounter) and one of a recognizer for latitudes.

Splitter Example: CommaSVSplitCounter

The example below counts how many times a column can be split on commas. package com.sqlstream.WebAgent.scrutinizer.autosplitter; public class CommaSVSplitCounter extends BaseSplitCounter {

@Override
public int count() {
  return xSVcount(this.cell, '.');
}

@Override
public String getShortName() {
  return "csv";
}

@Override
public String getLongName() {
  return "Comma-Separated Values (CSV)";
}

@Override
public String getType() {
  return "vclp";
}

@Override
public String getTextTrue() {
 return "CSV (comma-separated values)";
}

@Override
public String getSeparatorSQL() {
 return "','";
}

@Override
public String getEscapeSQL() {
 return "u&'\\005C'";
}

@Override
public String getQuoteSQL() {
 return "'\"'";
}

}

#### BaseSplitter
The following is the code for BaseSplitter.
package com.sqlstream.WebAgent.scrutinizer.autosplitter;

/*
* BaseSplitCounter works as a helper class that provides support for future
* splitcounters.
*
*/
public abstract class BaseSplitCounter implements SplitCounter {
protected String cell;

public void _prepare(String cell) {
 this.cell = cell;
}

public void _clear() {
 this.cell = null;
}

protected int xSVcount(String str, char delim) {
 int n = str.length();
 if(n < 1) return 0;

 int count = 0;
 char quote = '"';  
 char esc = '\\';  
 boolean inQuote = false;
 boolean inEscape = false;

 for(int i=0; i < n; i++) {
  char c = str.charAt(i);

  if(inEscape)
   inEscape = false;
  else if(inQuote) {
   if(c == quote)
    inQuote = false;
  } else {
   if(c == esc)
    inEscape = true;
   else if(c == quote)
    inQuote = true;
   else if(c == delim)
    count++;
  }
 }

 return count+1;
}
}

Recognizer Example: LatitudeRecognizer

The code below checks columns for a latitude pattern. package com.sqlstream.WebAgent.scrutinizer.recognizer; public class LatitudeRecognizer extends NumberRecognizer {

@Override
public boolean test() {
 return this.getFloat() != null && this.getFloat() >= -90
   && this.getFloat() <= 90;
}

@Override
public String getShortName() {
 return "latitude";
}

@Override
public String getLongName() {
 return "Column contains latitudes";
}

@Override
public String getTextTrue() {
 return "contains latitudes";
}

}