s-Server parses Apache Avro data similarly to the way it parses JSON data. (Avro schemas are defined with JSON.) All columns are parsed out of hierarchical Avro records using a specified "PATH" for corresponding Avro fields in the Avro schema. This is the schema stored in all Avro payloads, which is used to parse each message. (Some sources, such as Kafka or AMQP, may break the Avro payload up into one or more serialized Avro records. s-Server handles this by allowing you to indicate that Avro schema may be separate from a given batch of Avro data.)
s-Server matches data types according to its Ingestion Rules for Avro.
Note: You can also input data in larger chunks and parse it later using the Parser UDX ). This UDX calls the parsers listed above in a function. For more information on using functions, see the topic Transforming Data in s-Server.
The s-Server trace log includes information on readers' and parsers' progress. See Periodic Parser Statistics Logging in the Administering Guavus SQLstream guide.
Avro processing works like this:
To implement the AVRO parser, you create a foreign stream for a source, as described in the topics under Reading from Other Sources. Under stream or column options, you need to set the following stream options:
In s-Server 5.2, messages (typically in message queues such as Kafka) may be tagged with AVRO schema fingerprints. Those fingerprints will be matched with that for the AVRO_SCHEMA_FILE. Messages will be skipped if fingerprints don't match that of the AVRO_SCHEMA_FILE.
CREATE OR REPLACE SERVER "AVROREADERSERVER" TYPE 'FILE' FOREIGN DATA WRAPPER ECDA; CREATE SCHEMA "AVROSCHEMA"; SET SCHEMA '"AVROSCHEMA"'; CREATE FOREIGN STREAM "AVROSTREAM" ( slotnamevarchar(32), slotname2 varchar(32) ) SERVER "AVROREADERSERVER" OPTIONS ( PARSER 'AVRO', AVRO_SCHEMA_FILE '/schemas/2.1/fs.avsc', SCHEMA_HEADER 'false', ROW_PATH '$', SLOTNAME_PATH '$.slots[0:].slotName', SLOTNAME2_PATH '$.slots2[0:].slotName' );
|PARSER||This needs to be AVRO.|
|AVROSCHEMAFILE||This option can either be an HTTP URL to fetch the schema or it can be a path to a file on server host machine or VM.|
|SCHEMA_HEADER||is a required option to indicate if the Avro schema is embedded in the Avro data. This option needs to be set to false for data sources like Kafka or AMQP, where each message can be one or more serialized Avro records without a schema.|
|ROW_PATH||This is the Avro path for the row to be found. The Avro parser uses a row path to find Avro objects containing a row, then the path for each column is used to find the value to be extracted.|
|<COLUMNNAME>_PATH||Path for each column in the ECDA column set. This defaults to _\$..<COLUMNNAME>. Here, a column named 'FOO' would have an option named FOO_PATH that defaulted to \$..FOO which would return the first property named FOO under the Avro object found by the ROW_PATH.|
This section describes data type mappings between Avro types and SQL types.
Note: Arrays can be flattened. An array is flattened if any column path refers to field(s) of array elements using the
All other data types can be parsed as stringified VARCHAR values.
|Avro source value||s-Server Data Type|
|All numeric types||All numeric SQL types|
|bytes/fixed||BINARY/VARBINARY or DECIMAL|
Where a mapping to corresponding SQL type is not listed in the table above, for all AVRO data types, we return a JSON string representation of the AVRO datum as a VARCHAR value.