Editing Source Columns

Particularly when using Discovery with CSV sources, you may find that column headers are missing, or unsatisfactory, and you would like to change them.

If there are no column headers (or if you skip them) you will see generated column names like COL_0, COL_1, ... and so on.

Of course you can edit the column names in place on the source (or sink) definition. You can also open a multi-line editor that allows you to:

  • Export the metadata (by copying the text file)
  • Import metadate (from a text file)
  • Edit all column data together

To open the editor, simply click on the clipboard icon (outlined in red in the image above). You will see your metadata. For CSV this is presented as two columns:

  • the column name
  • the data type

Simply edit the metadata in this panel, or cut and paste from a saved metadata file. If you use the same file formats frequently - for example a specific call data record format - it is well worth saving this column metadata as a re-usable text file in your source code repository.

Finally, exit this panel and you will see the new metadata applied to your source or sink.

This can be used with any internal or external source, sink or lookup object - both streams and tables.

If your source is JSON or XML, you will see a third column in the metadata, which describes the JSON or XML path to each selected element. Here is an XML example:

In the editor this appears as:

id,/Table1/id,BIGINT
reported_at,/Table1/reported_at,TIMESTAMP
speed,/Table1/speed,INTEGER
driver_no,/Table1/driver_no,BIGINT
prescribed,/Table1/prescribed,BOOLEAN
gps,/Table1/gps,VARCHAR(128)
highway,/Table1/highway,VARCHAR(8)

You may update the column names (the first column in the metadata). Normally the paths only need changing if there are two similarly named elements and the wrong one has been selected by Discovery.

You can also add further column definitions (for example, if you know that there are additional "rare" elements not included in the data sample used by Discovery). Once again, these enhanced definitions can be saved into your source code repository and re-used for other pipelines and applications.

For JSON and XML the column data mapping is defined by the path, not by the column order - so you are free to re-order the columns. You may choose to order and group columns by data type, in alpha order, or by their purpose. This can be done easily using the multi-line editor.