Customizing the Scrutinizer

The StreamLab scrutinizer uses a series of Java jars that run on the s-Server webAgent. You can customize the Scrutinizer by writing your own recognizers and split counters. You write these as Java interfaces, extending or implementing one of two Java interfaces:

SplitCounter. Split counters will try and split a cell value and return a count of how many times it can be split. Recognizer. Recognizers are used to identify different kinds of formats, values and objects of a particular cell in a row.

The methods for both are spelled out in the Scrutinizer Javadoc, located at /scrutinizer_api/index.html

Loading Jars into webAgent

When webAgent runs, it checks recognizer.properties and splitcounters.properties, then tries to load classes listed in these files into the instance.

To load a new jar into web agent, you need to:

  • Add the jar to recognizer.properties or splitcounters.properties.
  • Add jars to webAgent’s classpath. You do so using the -cp command, along the following lines:

webAgent.sh -cp newSplitter.jar:newRecognizer.jar

Once you do so, StreamLab will begin scrutinizing files using the new scrutinizers.

Examples

We provide two examples below, one of a comma-based splitter (as well as BaseSplitCounter) and one of a recognizer for latitudes.

Splitter Example: CommaSVSplitCounter

The example below counts how many times a column can be split on commas. package com.sqlstream.webAgent.scrutinizer.autosplitter; public class CommaSVSplitCounter extends BaseSplitCounter {

   @Override
   public int count() {
       return xSVcount(this.cell, '.');
   }

   @Override
   public String getShortName() {
       return "csv";
   }

   @Override
   public String getLongName() {
       return "Comma-Separated Values (CSV)";
   }

   @Override
   public String getType() {
       return "vclp";
   }

   @Override
   public String getTextTrue() {
       return "CSV (comma-separated values)";
   }

   @Override
   public String getSeparatorSQL() {
       return "','";
   }

   @Override
   public String getEscapeSQL() {
       return "u&'\\005C'";
   }

   @Override
   public String getQuoteSQL() {
       return "'\"'";
   }

}

#### BaseSplitter
The following is the code for BaseSplitter.
package com.sqlstream.webAgent.scrutinizer.autosplitter;

/*
* BaseSplitCounter works as a helper class that provides support for future
* splitcounters.
*
*/
public abstract class BaseSplitCounter implements SplitCounter {
   protected String cell;

   public void _prepare(String cell) {
       this.cell = cell;
   }

   public void _clear() {
       this.cell = null;
   }

   protected int xSVcount(String str, char delim) {
       int n = str.length();
       if(n < 1) return 0;

       int count = 0;
       char quote = '"';  
       char esc = '\\';  
       boolean inQuote = false;
       boolean inEscape = false;

       for(int i=0; i < n; i++) {
           char c = str.charAt(i);

           if(inEscape)
               inEscape = false;
           else if(inQuote) {
               if(c == quote)
                   inQuote = false;
           } else {
               if(c == esc)
                   inEscape = true;
               else if(c == quote)
                   inQuote = true;
               else if(c == delim)
                   count++;
           }
       }

       return count+1;
   }
}

Recognizer Example: LatitudeRecognizer

The code below checks columns for a latitude pattern. package com.sqlstream.webAgent.scrutinizer.recognizer; public class LatitudeRecognizer extends NumberRecognizer {

   @Override
   public boolean test() {
       return this.getFloat() != null && this.getFloat() >= -90
               && this.getFloat() <= 90;
   }

   @Override
   public String getShortName() {
       return "latitude";
   }

   @Override
   public String getLongName() {
       return "Column contains latitudes";
   }

   @Override
   public String getTextTrue() {
       return "contains latitudes";
   }

}