Errors can occur while the SQLstream s-Server is evaluating queries. This topic describes the kinds of errors and the different contexts in which they occur, and discusses how the s-Server indicates and handles errors. Of particular interest are errors that occur while a long-running statement executes.
All errors are recorded in the s-Server trace log. You can also access errors through the Global Error Stream.
For a list of error messages see Appendix A - List of Error Messages.
This topic contains information on the following:
An error indicates that a requested operation has failed. Here "operation" means evaluating or preparing a SQL statement, but can also mean fetching each result from the result set of a query, or each execution of a prepared statement.
Errors can be classified by severity and by context.
The error context itself has several dimensions:
The s-Server's SQL parser and validator detect invalid statements at prepare time. To illustrate this, consider some simple statements that act on the stream orders, defined by
CREATE STREAM orders ( order_id INTEGER NOT NULL, customer_id INTEGER NOT NULL, item_count INTEGER, total_cost DECIMAL(8,2), comment VARCHAR(128));
Some examples of prepare time errors are:
Errors like these can be detected immediately by the s-Server, which records the error in the s-Server log, and signals the JDBC client driver, which makes the JDBC call fail, throwing a java.sql.SQLException. The s-Server prepares a statement in a context which knows about the JDBC client; this is feasible because preparation is quick.
Errors that occur during statement execution generally result from invalid data, and are tied to a specific row of input. Some examples are:
Handling such run-time errors can be tricky, due to the very nature of a distributed, streaming data system. With SQLstream, independent data-source clients insert data into streams, and independent data-reading clients receive result rows computed from the streaming inputs by applying (potentially complex) chains of relational operators.
Suppose an error in execution occurs as one of these operators processes an "offending row". It is easy for the operator to detect the error, but not so easy to determine who to tell, and how. The prepare-time context, which pointed to a live JDBC client thread waiting for news, is gone. Running on a RAMP, the relational operator performs one step in a large computation: it has no idea what SQL statement has failed, and which clients are affected.
The Control Node has that information, so the error message is relayed from the relational operator to the Control Node to the client.
However, this mechanism is not complete in the current s-Server release. Consequently some kinds of execution errors are not reported directly to a client, namely:
The same problem holds with a pump: the client context which created or started the pump is gone, and the pump is executing as a "server daemon" thread.
The workaround for this defect is to check the s-Server log for the pertinent error messages or warnings. (See the next topic regarding how to see errors from a remote client via streams.)
The s-Server records every error and warning in the s-Server trace log, using the standard Java logging mechanism. Configuration of system tracing is explained in the topic Using and Managing Trace Files in the SQLstream Administrator Guide. Fatal errors have level FATAL, regular errors have level SEVERE, and warnings have level WARNING. In its default configuration the s-Server also logs major events -- such as the start of execution of any SQL statement -- at level INFO. Thus INFO messages in the log appearing shortly before a SEVERE message often indicate the context of an error.
As a very simple illustration, consider the stream:
orders (order_id INTEGER NOT NULL, customer_id INTEGER NOT NULL, item_count INTEGER, total_cost DECIMAL(8,2), comment VARCHAR(128));
One client thread runs SELECT STREAM ROWTIME, *, (total_cost / item_count) as avg_item_cost FROM orders
Another thread prepares INSERT EXPEDITED INTO orders VALUES(?, ?, ?, ?, ?); and executes it, binding the following series of values:
|1||9:00:00||1||100||2||1.98||a valid row|
|2||9:01:00||2||100||two||1.98||a type error|
|4||9:03:00||4||102||0||0.00||zeros are trouble|
|5||9:04:00||5||101||7||0.84||a valid row|
Here rows 2 and 3 are rejected when the insert is executed. Row 4 requires evaluating (0/0), a run-time error, and so is dropped from the result set. The query returns only rows 1 and 5:
|9:00:00||1||100||2||1.98||a valid row||0.99|
|9:04:00||5||101||7||0.84||a valid row||0.12|
All errors are recorded in the s-Server log. Here is an excerpt, including INFO lines that show context, but omitting many irrelevant trace messages (marked by...):
Aug 10, 2008 4:20:00 AM net.sf.farrago.db.FarragoDbSession prepare INFO: INSERT INTO sales.orders (ROWTIME, order_id, customer_id, item_count, total_cost, comment) VALUES (TIMESTAMP '1999-09-09 9:00:00 1, 100, 2, 1.98, 'a valid row') Aug 10, 2008 4:20:01 AM net.sf.farrago.db.FarragoDbSession prepare INFO: INSERT INTO sales.orders (ROWTIME, order_id, customer_id, item_count, total_cost, comment) VALUES (TIMESTAMP '1999-09-09 9:01:00', 2, 100, 'two', 1.98, 'a type error') Aug 10, 2008 4:20:01 AM org.eigenbase.sql.validate.SqlValidatorException <init> SEVERE: org.eigenbase.sql.validate.SqlValidatorException: Cannot assign to target field 'ITEM_COUNT' of type INTEGER from source field 'EXPR$3' of type CHAR(3) Aug 10, 2008 4:20:01 AM net.sf.farrago.db.FarragoDbSession prepare INFO: INSERT INTO sales.orders (ROWTIME, order_id, customer_id, item_count, total_cost, comment) VALUES (TIMESTAMP '1999-09-09 9:02:00', 3, null, 5, 3.00, 'invalid null') Aug 10, 2008 4:20:02 AM net.sf.fennel.xo.FennelCalcRel.# 340:671 <native> WARNING: calculator error :PC=15 Code=22004 Aug 10, 2008 4:20:02 AM net.sf.farrago.db.FarragoDbSession prepare INFO: INSERT INTO sales.orders (ROWTIME, order_id, customer_id, item_count, total_cost, comment) VALUES (TIMESTAMP '1999-09-09 9:03:00', 4, 102, 0, 0.00, 'zeros are trouble') Aug 10, 2008 4:20:03 AM org.eigenbase.util.EigenbaseException <init> SEVERE: org.eigenbase.util.EigenbaseException: Overflow during calculation or cast. INFO: INSERT INTO sales.orders (ROWTIME, order_id, customer_id, item_count, total_cost, comment) VALUES (TIMESTAMP '1999-09-09 9:04:00', 5, 101, 7, 0.84, 'a valid row')
All errors that are logged to the trace log are also available in a system management stream called SYS_BOOT.MGMT.ALL_TRACE. The latest (up to 1000) error messages are also available in a table called SYS_BOOT.MGMT.SAVED_TRACE. You can use these to create visualizations of errors; they can be accessed from SQLline or anywhere that has a JDBC connection to s-Server.
SYS_BOOT.MGMT.SAVED_TRACE defaults to retain the last 1000 rows. You can increase the number of rows cached in SYS_BOOT.MGMT.SAVED_TRACE by using a property set in Trace.properties:
Increasing this size may be helpful, for example, if you want to analyze large amounts of log data retrospectively.
More information about the trace log can be found in the topic Using Trace Files in the Building Applications Guide. As described in that topic, you can control the level of trace messages -- error, informative, debugging -- is for each component of the server by modifying AspenTrace.properties. You can also change whether or not s-Server summarizes error messages (reports a periodic summary of errors), as described below.
To query the error stream, enter something like the following in sqlline :
SELECT STREAM ROWTIME, * from SYS_BOOT.MGMT.ALL_TRACE;
This will produce a continuous stream of trace messages or errors as they occur. You can substitute specific columns for. These are listed below. Columns are the same for both table and stream.
To query the error table, enter something like the following:
select * from sys_boot.mgmt.saved_trace where error_level = 'SEVERE';
|ROWTIME||TIMESTAMP||Timestamp for the error row itself.|
|error_time||TIMESTAMP||Time that error was logged. Differs from the rowtime of the error row itself, which is slightly later than the time the error occurred. Also differs from the rowtime of the input row associated with the error (data_rowtime).|
|error_level||VARCHAR(10) NOT NULL||Java trace level, SEVERE, WARNING, and so on. See https://docs.oracle.com/javase/7/docs/api/java/util/logging/Level.html for details.|
|is_error||BOOLEAN NOT NULL||True for errors, false for other traces.|
|error_class||INTEGER||Code for error category: 0:other error, 1:ingest error, 2:egest error, 3:syntax error, 4:validation error, 5:planner error, 6:late rowtime, 7:calculator error, 8:other execution error.|
|sql_state||VARCHAR(5)||An alphanumeric error code. See Appendix B - List of Error Messages for more details.|
|error_name||VARCHAR(32),||Explanation of sql_state. See - Appendix B - List of Error Messages for more details.|
|message||VARCHAR(4096)||Entire trace message|
|thread_id||INTEGER||Number of the execution thread where the error happened.|
|session_id||INTEGER||Number of the client session where the error happened.|
|statement_id||INTEGER||The id number for the SQL statement where the error happened.|
|graph_id||INTEGER||graph_id and node_id identify which execution object had the error. See explanation of nodes and graphs below.|
|node_id||INTEGER||graph_id and node_id identify which execution object had the error. See explanation of nodes and graphs below.|
|xo_name||VARCHAR(128)||a unique name of the execution object|
|error_reporter||VARCHAR(256) NOT NULL||the name of the java logger that reported the error|
|error_sql||VARCHAR(256)||the SQL of the statement|
|error_backtrace||VARCHAR(4096)||if available, java backtrace at the point of the errot|
|data_row||TIMESTAMP||The input row that was being processed when the error occurred.|
|data_rowtime||VARBINARY(32768)||Rowtime of data_row|
|source_position_key||VARCHAR(4096)||Identifies the location in the input source, such as a log file or kafka topic.|
|pump_db||VARCHAR(128),||Database that includes the pump that runs the statement; null if not in a pump.|
|pump_schema||VARCHAR(128)||Schema that includes the pump that runs the statement; null if not in a pump.|
|pump_name||VARCHAR(128)||Name of the pump that runs the statement; null if not in a pump.|
By default, s-Server reports a periodic summary of errors, in order to avoid bloating the trace log. However, this means that the global error stream contains only the summary counts and a few sampled bad rows. If you turn off sampling, you get all the bad rows traced and reported to the error stream.
To adjust sampling, you can change properties in the aspen.properties or aspen.custom.properties files. For more information on modifying aspen.properties, see the topic Managing Java Parameters.
|aspen.trace.sampleLateRows||true||When this property is set to false, s-Server reports every late row in both the trace log and error stream. When this property is set to true, s-Server reports the first event and then reports a summary every 15 seconds after the first event.|
|aspen.trace.sampleFailedRows||true||You can also specify that s-Server report every row that causes a calculator error. When this property is set to false, s-Server reports every row that causes a calculator error. When this property is set to true, s-Server reports the first calculator event and then reports a summary of calculator events every 15 seconds.|
Note: All messages (including errors) that are logged to the trace log are also available in a system management stream called SYS_BOOT.MGMT.ALL_TRACE. The latest (by default up to 1000) log messages are also available in a table called SYS_BOOT.MGMT.SAVED_TRACE. You can use these to create visualizations of errors; they can be accessed from SQLline or anywhere that has a JDBC connection to s-Server. See the Global Error Stream for more details.
s-Server consist of two layers:
The specification of a data-flow is called a physical query plan. The actual data flow is called a stream graph, which is a chain of stream nodes, each of which is a small software machine that applies some relational operation to its input data stream(s) to produce is output data stream(s). Nexus nodes are defined by a CREATE STREAM statement. They are freestanding nodes to which other stream graphs can connect and disconnect, in order to receive or deliver data.
In the most common case, a node has 1 input and 1 output, and so the graph is a linear chain of operations. But data flows can merge and split, and the graph of a new query can connect to the graphs of existing queries as they run.
Note: A node does not read data directly from its upstream neighbor. Instead, the upstream neighbor puts its output rows into a block of memory called a buffer, which it shares with the downstream neighbor. The upstream node writes rows to the buffer, and the downstream node reads rows from the buffer.
Because there are more stream nodes than CPU cores, all the nodes cannot actually run simultaneously. Instead, a software component called the scheduler manages the stream nodes. The scheduler runs nodes as needed in order to push data down the pipeline.
The scheduler decides when to execute a node based on its status, which consists of 1) its state (ready/sleeping/blocked/finished), 2) the availability of data for input or room for output, and 3) the - rowtime of the next input row. The scheduler also keeps track of statistics for each execution of each node, measuring the time spent, the amount of data input and output, and so on.