To read CSV formatted data, you need to specify a separator character (such as a comma or a pipe) for values. You can also indicate whether or not the file has a header. When the Extensible Common Data Framework parses CSV, each row becomes a row in the stream. Columns are identified by the separator character, which can be a comma, a pipe, or any other character you designate. s-Server matches data types according to its Ingestion Rules for CSV.
Note: You can also input data in larger chunks and parse it later using the Parser UDX ). This UDX calls the parsers listed above in a function. For more information on using functions, see the topic Transforming Data in s-Server in this guide.
The s-Server trace log includes information on readers' and parsers' progress. See Periodic Parser Statistics Logging in the Administering Guavus SQLstream guide.
Note: You can also input data in larger chunks and parse it later using the Parser UDX. This UDX calls the parsers listed above in a function. For more information on using functions, see the topic Transforming Data in s-Server in this guide.
The following example sets up a server called CSVReaderServer, then creates and sets a schema called schema_csv. It then sets up a foreign stream that will parse columns called id, reported_at, shift_no, trip_no, lat, lon, speed, bearing, driver_no, and highway from a file in /tmp/ called buses.log. To parse CSV over other input/output systems, such as Kafka, Kinesis, a network socket, a WebSocket, HTTP or AMQP, you would need to specify options for these formats. See Reading from Other Sources for more details.
Note: The examples below use the file system as an input system. To parse CSV over other systems, such as Kafka or AMQP, you would need to specify options for these formats. See Reading from Other Sources for more details.
create or replace server CSVReaderServer type 'FILE' foreign data wrapper ECDA; create or replace schema schema_csv; set schema 'schema_csv'; CREATE OR REPLACE FOREIGN STREAM read_from_csv ( "id" BIGINT, "reported_at" TIMESTAMP, "shift_no" VARCHAR(8), "trip_no" VARCHAR(4), "lat" DOUBLE, "lon" DOUBLE, "speed" INTEGER, "bearing" INTEGER, "driver_no" BIGINT, "highway" VARCHAR(8) ) SERVER CSVReaderServer OPTIONS ( "PARSER" 'CSV', "CHARACTER_ENCODING" 'UTF-8', "SEPARATOR" ',', "SKIP_HEADER" 'false', "DIRECTORY" '/tmp/', "FILENAME_PATTERN" 'buses\.log' );
To actually begin reading from the file, use a statement along the lines of
SELECT stream * from CSVReaderStream;
You can use this option to parse only some of the columns,in a CSV file, or to reorder the columns from CSV data.
To implement this option, you implement code along the following lines:
CREATE OR REPLACE FOREIGN STREAM my-stream ( COMMON_NAME VARCHAR(64), SPECIES VARCHAR(64), GENUS VARCHAR(64) ) SERVER FileReaderServer OPTIONS( DIRECTORY '/animals/', FILENAME_PATTERN 'animals.csv', COLUMN_MAPPING ',COMMON_NAME,GENUS,SPECIES', PARSER 'CSV', STATIC_FILES 'true', SKIP_HEADER 'true' );
When you run a SELECT against this stream, the parser will take the following actions:
|SKIP_HEADER||True or false; defaults to false. Specifies if the parser should skip the first row of input files. Usually, you would do this if the first row of data contains column headers.|
|QUOTE_CHARACTER||Lets you specify an expected quotation character. There is no default for quote character.|
|ROW_SEPARATOR||Lets you specify a character that splits lines in the source. Defaults to /n.For example, in the case of reading from a DOS format file where line separators are CR/LF rather than just LF, you will need to specifyrow_separator U&'\000D\000A'(This is the hexidecimal Unicode value for CR/LF. Seehttp://unicode.org/standard/reports/tr13/tr13-5.html.|
To parse CSV files with the ECD Agent, configure the options above using the ECD Agent property file with properties similar to the following:
ROWTYPE=RECORDTYPE(VARCHAR(2040) id, VARCHAR(2040) reported_at, VARCHAR(2040) shift_no, VARCHAR(2040) trip_no, VARCHAR(2040) route_variant_id)
s-Server applies the following coercion rules when parsing CSV data. All empty cells are cast as NULL.
|CSV Source Cell||numeric string||non-numeric string|
|BIGINT, DECIMAL, DOUBLE, INT, SMALLINT, REAL, TINYINT||Raise an exception if the JSON number lies beyond the maximum or minimum boundary of the target type. If the number has any decimal digits (including the vacuous .0) and it is being ingested into an integer column, then raise an error.||Raise an error.|
|VARBINARY||Raise an error.||If the string, without quotes, is a valid SQL Standard BINARY literal, then we will ingest it, truncating and 0-filling as necessary. A Standard BINARY literal has the form X'...' where ... is a sequence of (case-insensitive) hex digits.|
|CHAR, VARCHAR||Put double-quotes around the number and then ingest subject to the truncation/padding rules for strings.||If the string won't fit in the target SQL character value, then excess trailing characters are discarded. CHAR values are space-padded up to their declared length.|
|TIME, TIMESTAMP||Raise an error.||OK if the string parses as a DATE/TIME/TIMESTAMP. Otherwise, raise an error.|
|Strings are parsed per ISO standards at https://www.w3.org/TR/NOTE-datetime|