You can use a number of techniques to monitor s-Server and its components. This topic provides an overview of monitoring s-Server and its components.
This page covers the following subtopics:
You may also find it useful to also incorporate generic system monitoring tools including:
For information on these tools, please see your operating system’s documentation. To ensure that your system remains operational, you should monitor all servers in some way (see Monitoring the Monitors below for information on managing monitoring systems themselves).
SQLstream service PIDs are recorded in pid files under /var/run/sqlstream. For each service to be monitored, verify that:
s-Server itself has two pid files: s-serverd.pid for the core and jvm.pid for the Java VM. Here is a simple bash fragment to check process status, and report UP, BROKEN or DOWN for each:
for s in $SERVICES
do
pid=0
message=
pidfile=/var/run/sqlstream/$s.pid
if [ "$s" = "snmpd" ]
then
# this is not owned by sqlstream, so use simpler method
sudo service snmpd status > /dev/null
if [ $? -eq 0 ]
then
status=UP
message="service running normally"
else
status=DOWN
message="service not running"
fi
elif [ -e $pidfile ]
then
pid=`cat $pidfile`
pgrep --pidfile $pidfile > /dev/null
if [ $? -ne 0 ]
then
# there is no matching process
status=BROKEN
message="process not running"
# should we restart the service?
# restart $s
else
# all good - process running
status=UP
message="service running normally"
fi
if [ -z "$pid" ]
then
pid=0
fi
else
status=DOWN
message="service not running"
fi
# send the service status in all cases, as we are using it to drive alerts as well as status monitor
echo "$server,$s,$status,$pid,$message" >> $serviceStatusFile
statustext="$statustext - $s:$status"
done
We recommend using your operating systems utilities and/or jstack to monitor heap size for the various SQLstream processes. Here is a simple bash function to check heap size for a given JVM. Parameter 1 is the name of the service (such as jvm or webagentd). The PID and heap stats are returned in jspid and jstats (the latter contains two values, current and max heap size).
for s in $SERVICES
do
pid=0
message=
pidfile=/var/run/sqlstream/$s.pid
if [ "$s" = "snmpd" ]
then
# this is not owned by sqlstream, so use simpler method
sudo service snmpd status > /dev/null
if [ $? -eq 0 ]
then
status=UP
message="service running normally"
else
status=DOWN
message="service not running"
fi
elif [ -e $pidfile ]
then
pid=`cat $pidfile`
pgrep --pidfile $pidfile > /dev/null
if [ $? -ne 0 ]
then
# there is no matching process
status=BROKEN
message="process not running"
# should we restart the service?
# restart $s
else
# all good - process running
status=UP
message="service running normally"
fi
if [ -z "$pid" ]
then
pid=0
fi
else
status=DOWN
message="service not running"
fi
# send the service status in all cases, as we are using it to drive alerts as well as status monitor
echo "$server,$s,$status,$pid,$message" >> $serviceStatusFile
statustext="$statustext - $s:$status"
done
The heap size required will vary substantially depending on the nature of the processing pipelines installed. The important thing is to establish a normal pattern, then identify exceptional conditions when they occur.
The default locations for trace/log files are:
Service | Trace Properties / Log Directory | Trace Properties | Log Filename |
---|---|---|---|
s-serverd | /var/log/sqlstream which is a link to $SQLSTREAM_HOME/trace | Trace.properties | Trace.log.[0-99] |
webagentd | $SQLSTREAM_HOME/../clienttools/WebAgent/trace.log. | trace.properties | trace.log.[0-99] |
If you are using WebAgent, you can monitor it using its /status API. A tool for accessing this API is available at port 5580 at whatever host is running WebAgent. If you launch WebAgent manually (if you are not running it as a service, you will need to enable the -a option when you launch WebAgent in order to make the API available. If there is no response at all, that may be because the agent is down or it could be that the agent is up but is currently unable to connect to s-Server. Here is a very simple bash function to check the WebAgent status (returned as waStatus):
function checkwebAgentStatus () {
WAPORT=5580
rm -f /tmp/sqlstream/wastatus.json /tmp/sqlstream/wastatus.log 2>/dev/null
if [ -f /var/run/sqlstream/webagentd.pid ]
then
# apparently running
# TODO - check service is not broken
# wget localhost:$WAPORT/status -O /tmp/sqlstream/wastatus.json -o /tmp/sqlstream/wastatus.log 2>&1 >/dev/null
# seems curl is more likely to be available
curl --silent -o /tmp/sqlstream/wastatus.json localhost:$WAPORT/status2>&1 >/dev/null
if [ $? -ne 0 ]
then
# there was some problem with the fetch
waStatus="UNREACHABLE"
else
# we got a status back
grep "\"message\":\"OK\"" /tmp/sqlstream/wastatus.json 2>&1 >/dev/null
if [ $? -eq 0 ]
then
# status was apparently OK
waStatus="UP"
else
waStatus="BROKEN"
# save a timestamped copy of the file
mv /tmp/sqlstream/wastatus.json /tmp/status/wastatus.$tstamp
fi
fi
else
waStatus="DOWN"
fi
}
When the SQLstream server and its supporting components such as WebAgent are to be deployed as part of a production system, it is important to understand which self-monitoring capabilities can be integrated into a monitoring and scaling framework. The following mechanisms can be used to detect an unhealthy server, or to recognize when scaling may be needed.
The webagent (usually deployed with a base URL of http://:5580) has a number of entry points that are useful. See Using Telemetry to Monitor Performance for more information about the telemetry URLs, and Websocket Requests for WebAgent status.
API | Description |
---|---|
/status | returns a JSON document which proves that webagent is running, but doesn’t indicate whether s-Server is up or down |
/telemetry/graph?format=json&nowait=true | confirms both webagent and s-Server are up – the nowait parameter ensures that any problem is revealed rapidly – if s-server is down an error status will be returned |
/telemetry/graph?format=json | returns detailed telemetry data which confirms that the scheduler is running, and also can be used to tell which pumps are currently running, blocked etc. The latest rowtimes recorded against stream operators can be used to identify lags – if lag is increasing beyond acceptable thresholds it may be time to scale up the deployment. Conversely if lags are below threshold and cpu usage is acceptable it may be possible to scale down |
See Configuring Ports for Guavus SQLstream for a list of (default) ports used by Guavus SQLstream.
Note: Some of these port allocations can be modified either during the installation process, or when starting services / processes. For more information, see the documentation or contact SQLstream Support.
A very simple way of telling that data is being processed is to arrange for some sort of summary data to be delivered to a file every minute (or at some other suitable frequency). As well as giving application-level statistics (such as counting input rates) this provides immediate assurance that the end-to-end process is running.
It is of course possible to build a monitoring system using SQLstream itself. Data flows can be aggregated and analysed; changes to data flow patterns can be detected and alerted on. Long term storage of the data into a database is simple.
File based monitoring on the individual SQLstream servers may be inconvenient, unless there is already a standardised monitoring agent present on the application nodes (for example, logStash as a companion to Kibana). As an alternative, consider push/send (or use agents to receive/pull) the periodic messages direct to a central monitoring service. Wherever possible the system being monitored should be unaware of the monitoring framework. The monitoring framework should be able to scale easily from monitoring a simple, single application node to monitoring one or more clusters.
One of the first decisions to be taken is whether the application should push monitoring data to the monitor, or the monitor should poll data from the application.
A big advantage of the push approach is that the monitor doesn’t have to worry which application nodes to monitor; it can passively accepts data from any running system (as long as messages are all identified by originating server).
The monitor can pull data (without polling) by setting up streaming queries at the monitored application nodes. These can pull data into a SQLstream processing pipeline for further monitoring, or ECDA agents can hand off the data into a local flat-file on the monitor server, assuming that will be loaded/processed by the monitoring software.
The drawback with pull-based approaches is that the monitor needs to set up queries to all application nodes (SQLstream s-servers). This can substantially increase the complexity of the monitor. The monitor of course also needs to know exactly which application nodes are expected to be running at any time.
If the SQLstream monitor needs to access the SQLstream application service directly, they both need to use the same version of the SQLstream JDBC driver. If you want to be able to upgrade the monitor independently of the application, you should consciously break this dependency; instead use message transport mechanisms that are not SQLstream version dependent, such as Kafka or network sockets.
Generally, a push-based approach is likely to be the easiest to build and deploy. If Kafka is in wide use, consider benefiting from its store-and-forward and replay capabilities.
Analytical processing can take place either in the application node or at the central monitor. The choice will depend on many factors:
Analytical capability of the monitor compared to the SQLstream application.
Processing loads at application and monitor servers.
Preference for encapsulation of application-specific functionality (the monitor does not / should not need detailed knowledge of* the application).
Can the necessary metrics be calculated at the source? Of course some application-specific final stage of aggregation is always required at the monitor - but where possible make this as application-neutral as possible.
For example: use a single stream for counting transaction volumes for all transactions across all application nodes. Then new transactions can be added to the application without modifying the monitor.
This also allows higher level aggregations (eg count of total volume, as well as volume by transaction).
There is always a risk that monitoring systems and subsystems can themselves fail. To mitigate that, we recommend that you always include some basic stand-alone monitoring or alerting on every application and monitor node. This may be as simple as some bash scripts and email alerters, or it could use common network monitoring tools such as Nagios. These tools should be completely independent of the main application components (such as Kafka and SQLstream).
This section describes system views and system procedures useful to a system administrator for managing and monitoring a running SQLstream system. All system views and system procedures described here are located in the schema SYS_BOOT.MGMT. The code for the system procedures is in the java package net.sf.farrago.syslib). The SQL command ALTER SYSTEM can also modify various system parameters. For more details, see the topic ALTER SYSTEM in the s-Server Streaming SQL Reference Guide.
View | Description |
---|---|
OBJECTS_IN_USE_VIEW | Displays which objects are in use by which statements. |
PUMP_OPTIONS | Lists values of each session variable associated with a pump (if any) |
SESSION_PARAMETERS_VIEW | Displays private parameter settings for the current session |
SESSIONS_VIEW | Displays all active sessions. Each row describes a session, giving a unique session id, and such attributes as the jdbc connection URL, the user, the transaction status, etc. |
STATEMENTS_VIEW | Displays all executing statements. Each statement has a row, which gives a unique statement id, and such attributes as the containing session (as its session id, from the sessions_view), the start time, the SQL being executed, the current value of any dynamic parameters, etc. |
View | Description |
---|---|
HISTOGRAM_BARS_VIEW | Displays statistics about tables, columns, and indexes |
HISTOGRAMS_VIEW | Displays statistics about tables, columns, and indexes |
PAGE_COUNTS_VIEW | Displays statistics about tables, columns, and indexes |
REPOSITORY_PROPERTIES_VIEW | Displays properties of the MDR repository used to store Farrago’s catalog. |
ROW_COUNTS_VIEW | Displays statistics about tables, columns, and indexes |
SEQUENCES_VIEW | Displays the state of generated columns |
Some of these views are used in scripts, such as the support scripts with extension .sql found in the SQLSTREAM_HOME/support/sql directory for use in querying your database schemas and configurations. Other examples of scripts appear in the demo subdirectories of the SQLSTREAM_HOME/demo directory.
In SQLline you can use the !run command to execute any such script, using the following template:
!run <support script name>.sql
For more details on using SQLline, see the topic SQLline in the s-Server Getting Started Guide. Most script names describe what they do, and some scripts use the views named above. Here are three examples:
!run showSessions.sql
This command uses the SESSIONS_VIEW in running the following command:
select id, system_user_name, session_user_name,url from sys_boot.mgmt.sessions_view;
!run showStatements.sql
This command uses the STATEMENTS_VIEW in running the following command:
SELECT * from sys_boot.mgmt.statements_view;
SELECT * from table(sys_boot.mgmt.session_parameters());
This command uses the SESSION_PARAMETERS_VIEW in running the following command:
SELECT * from sys_boot.mgmt.statements_view;
The $SQLSTREAM_HOME/support/sql directory includes the following support scripts:
Name | Name | Name |
---|---|---|
showFennelConfig.sql | showFarragoConfig.sql | showForeignServers.sql |
showForeignStreams.sql | showProcedureColumns.sql | showProcedures.sql |
showPumps.sql | showSchemas.sql | showSessionParams.sql |
showSessions.sql | showStatements.sql | showSystemInfo.sql |
showTableColumns.sql | showTableColumnsAll.sql | showUsers.sql |
These are all functions in the SYS_BOOT.MGMT schema; like other SYS_BOOT.MGMT objects they can be referenced without specifying the catalog and schema.
UDX | Description |
---|---|
REPOSITORY_INTEGRITY_VIOLATIONS() | Checks for integrity violations in the MDR repository. |
THREADS() | Lists threads running in the JVM. |
THREAD_STACK_ENTRIES() | Dumps the stack for each thread |
SYSTEM_INFO() | Retrieves information about the environment in which SQLstream is running (e.g., environment variables, system properties, and OS and hardware information). |
PERFORMANCE_COUNTERS() | Monitors performance counters such as cache hit ratios. |
Example usage:
SELECT * FROM TABLE(thread_stack_entries()) WHERE thread_id = 18;
Function | Description |
---|---|
SLEEP | Can be used to introduce an artificial delay into a SQL statement. |
You can call these procedures using code along the following lines:
CALL kill_statement(39);
Procedure | Description |
---|---|
FLUSH_CODE_CACHE | Discards all entries from the global code cache. |
STAT_SET_ROW_COUNT | Synthesizes an artificial row count for a table |
STAT_SET_PAGE_COUNT | Synthesizes an artificial page count for an index |
STAT_SET_COLUMN_HISTOGRAM | Synthesizes an artificial histogram for a column |
KILL_STATEMENT(stmt_id) | An administrator can kill an executing, specified by id (as seen in STATEMENTS_VIEW). Killing a statement aborts its execution and frees all resources allocated to the statement. A corresponding JDBC Statement will be invalidated, and any JDBC ResultSet will be terminated (with an SQLException to indicate that the statement was killed from outside.) |
KILL_STATEMENT_MATCH(‘a sql fragment’) | Terminates all statements whose sql commands contain the provided text. |
KILL_SESSION(session_id) | An administrator can kill an active session by calling this procedure. The session is specified by the id that appears in the sessions_view. Killing a session kills all statements executing in the session, aborts any transactions, and frees all allocated resources. A corresponding jdbc Connection is invalidated: any further use gets an SQLException. (However, if the optional boolean cancel_only parameter is specified as true for any of the kill calls, then the effect is just to request an asynchronous cancel and return immediately; no wait or resource release is performed, and the session or statement remains valid.) |
For troubleshooting purposes, you may at times need to get a stack dump for s-Server. To get a stack dump, we recommend using the following command
jstack <process id>
This is more useful if you specify a file name for the stackdump, as in the following:
jstack 12345 > stackdump.txt