Provenance Column

A Provenance Column is a column that is populated by a source plugin - either a reader or parser. The column includes some metadata from the source plugin that can be useful in downstream processing. For example:

  • SQLSTREAM_PROV_FILE_SOURCE_FILE - the name of the file from which the row has been read.
  • SQLSTREAM_PROV_KAFKA_KEY - the key associated with a record read from Kafka
  • SQLSTREAM_PROV_SOCKET_SOURCE_HOST - the source from which a message was received over a network socket

One important use of provenance columns is to construct a source watermark that can be stored at the sink and then used for restart / recovery after failure - see Using Watermarks.

By convention, all provenance column names start with SQLSTREAM_PROV.

For more information about provenance columns for particular plugins, see: