|
An operational XVM continuously collects raw statistics during the course of its operation. The xvm can also be configured to spin up a background thread that periodically performs the following:
The raw metrics collected by the server are used by the background statistical thread for its computations and can also be retrieved programmatically by an application for its own use.
In this document, we describe:
Configuration Setting | Default | Description |
---|---|---|
nv.server.stats.enable | false | Enable or disable server stats collection and heartbeat emission. If tracers have been configured at the debug level, server statistics will be traced on the same thread that performs stats collection and emits heartbeats. |
nv.server.stats.interval | 1000 | The interval (in milliseconds) at which server stats will be collected and heartbeat events emitted when nv.server.stats.enable=true. |
nv.server.stats.includeSeries | true | Indicates whether or not series stats should be included in heartbeats. |
nv.server.stats.includeSeriesDataPoints | false | Indicates whether or not series stats should report the data points captured for a series statistic. |
nv.server.stats.maxTrackableSeriesValue | 10 minutes | The maximum value (in microseconds) that can be tracked for reported series histogram timings. |
nv.server.stats.pool.enable | true | Indicates whether or not pool stats are collected by the server when nv.server.stats.enable=true. |
nv.server.stats.pool.depletionThreshold | 1.0 | Configuration property used to set the percentage decrement at which a preallocated pool must drop to be included in a server heartbeat. Setting this to a value greater than 100 or less than or equal to 0 disables depletion threshold reporting. This gives monitoring applications advanced warning if it looks like a preallocated pool may soon be exhausted. By default the depletion threshold is set to trigger inclusion in heartbeats at every 1% depletion of the preallocated count. This can be changed by specifying the configuration property nv.server.stats.pool.depletionThreshold to a float value between 0 and 100. For example: Setting this to a value greater than 100 or less than or equal to 0 disables depletion threshold reporting.
|
When heartbeats are enabled via the nv.server.stats.enable and nv.server.stats.interval properties they can be handled in several ways, discussed below.
By default all server statistics tracers are disabled as trace logging is not zero garbage and introduces cpu overhead in computing statistics. While tracing heartbeats isn't recommended in production, enabling server statistics trace output can be useful for debugging and performance tuning. To enable you will need to configure the appropriate tracers at the debug level. See the Output Trace Loggers section for more detail.
Applications that are latency sensitive might prefer to leave all tracers disabled to avoid unnecessary allocations and the associated GC activity. As an alternative, it's possible to enable logging of zero-garbage heartbeat messages to a binary transaction log:
<?xml version="1.0" encoding="utf-8"?> <model xmlns="http://www.neeveresearch.com/schema/x-ddl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"> <!-- ... snip ... --> <servers> <!-- ... --> <server name="..." group="..."> <!-- ... --> <heartbeatLogging enabled="true"> <storeRoot>/path/to/heartbeat/log/directory</storeRoot> </heartbeatLogging> <!-- ... --> </server> <!-- ... --> </servers> <!-- ... snip ... --> </model> |
Heartbeat logs can be queried out of process using the Stats Dump Tool.
Note that at this time binary heartbeat logs do not support rolling collection, so at this time the built-in heartbeat logging mechanism is not suitable for long running application instances.
Your application can register an event handler for server heartbeats to handle them in process.
@EventHandler public void onHeartbeat(SrvMonHeartbeatMessage message) { // Your logic here: // - You could emit over an SMA message bus. // - log to a time series database. // etc, etc. } |
See the SrvMonHeartbeatMessage JavaDoc for API details.
A Robin Controller can also be used to connect to a server and listen for heartbeats emitted by a Talon Server.
The server statistics thread performs the following operations:
The statistics thread can be enabled or disabled via the configuration parameters.
The server statistics thread can also be started via the following environment variables or system properties:
nv.server.stats.enable=true |
Once started administratively, the statistics thread remains active until the server is stopped.
Note: When configuring using environment variables, in Unix based systems where the shell does not support "." in environment variables. If you would like to use environment variables to set server statistics config on Unix, replace any "." characters in the variable name with "_". |
By default, the server statistics thread collects object pool stats. If you would prefer pool stats not to be recorded:
nv.server.stats.pool.enable=false |
The server statistics thread computes (then optionally traces and/or logs) the following at the configured frequency:
Reported for each active thread:
To reduce the size of heartbeats, Pool Stats for a given pool are only included when:
Pool stats include:
Stat | Description |
---|---|
Number of puts | The overall number of times items were put (returned) to a pool. |
Delta number of puts | The number of times items were put (returned) to a pool since the last time the pool was reported in a heartbeat |
Number of gets | The overall number of times an item was taken from a pool. |
Delta number of gets | The number of times an item was taken from a pool since the last time the pool was reported in a heartbeat. |
Number of hits | The overall number of times that an item taken from a pool was satisfied by there being an available item in the pool. |
Delta number of hits | The number of times that an item taken from a pool was satisfied by there being an available item in the pool since the last time the pool was reported in a heartbeat. |
Number of misses | The overall number of times that an item taken from a pool was not satisfied by there being an available item in the pool resulting in an allocation. |
Delta number of misses | The number of times that an item taken from a pool was not satisfied by there being an available item in the pool resulting in an allocation since the last time the pool was reported in a heartbeat. |
Number of growths | The overall number of times the capacity of a pool had to be increased to accomodate returned items. |
Delta number of growths | The number of times the capacity of a pool had to be increased to accomodate returned items since the last time the pool was reported in a heartbeat. |
Number of evicts | The overall number of items that were evicted from the pool because the pool did not have an adequate capactiy to store them. |
Delta number of evicts | The overall number of items that were evicted from the pool because the pool did not have an adequate capactiy to store them since the last time the pool was reported in a heartbeat. |
Number of detached washes | The overall number of times that an item return to the pool was washed (e.g. fields reset) in the detached pool washer thread. |
Delta number of detached washes | The number of times that an item return to the pool was washed (e.g. fields reset) in the detached pool washer thread since the last time the pool was reported in a heartbeat |
Pool size | The number of items that are currently in the pool available for pool gets. This number will be 0 if all objects that have been allocated by the pool have been taken. |
Number of preallocated objects | The number of items initially preallocated for the pool. |
Pool capacity | The capacity of the backing array that is allocated to hold available pool items that have been preallocated or returned to the pool. |
Pool key | The unique identifier for the pool. |
Reported for each app:
See Also:
The server statistics thread uses several named trace loggers to log the raw and computed statistics output. The following is the list of the loggers used by the server stats thread.
If you would like to see server statistics trace output, the appropriate tracers need to be configured at the debug level. For example, to enable System Stats trace output:
nv.server.stats.sys.trace=debug
See Also: X Platform Tracing and Logging for general details on configuration of trace logging.
Appears in trace output when nv.server.stats.enable=true and nv.server.stats.sys.trace=debug
[System Stats] |
Individual thread stats can be traced by setting the following in DDL:
<servers> <server name="my-xvm"> <heartbeats enabled="true" interval="5"> <tracing enabled="true"> <traceThreadStats>true</traceThreadStats> </tracing> </heartbeats> </server> </server> |
When enabled the following stats are traced to the console.
[Thread Stats] ID CPU DCPU DUSER CPU% USER% STATE NAME 1 13.5s 3.5s 3.5s 69 100 TIMED_WAITING main 2 0 0 0 0 0 WAITING Reference Handler 3 15.6ms 0 0 0 0 WAITING Finalizer 4 0 0 0 0 0 RUNNABLE Signal Dispatcher 5 0 0 0 0 0 RUNNABLE Attach Listener 9 0 0 0 0 0 RUNNABLE ReaderThread 16 0 0 0 0 0 TIMED_WAITING X-EDP-Timer 18 187.5ms 15.6ms 15.6ms 1 100 RUNNABLE X-Server-ad-exchange-1-StatsRunner 19 843.8ms 0 0 0 0 RUNNABLE X-Server-ad-exchange-1-Main 21 0 0 0 0 0 TIMED_WAITING X-EventMultiplexer-Wakeup-admin |
Columns can be interpreted as:
Column | Description |
---|---|
ID | The thread's id |
CPU | The total amount of time in nanoseconds that the thread has executed (as reported by the JMX thread bean) |
DCPU | The amount of time that the thread has executed in user mode or system mode (as reported by the JMX thread bean) |
DUSER | The amount of time that the thread has executed in user mode in the given interval in nanoseconds (as reported by the JMX thread bean) |
CPU% | The percentage of cpu time the thread used during the interval (e.g. DCPU * 100 / interval time) |
USER% | The percentage of execution in user mode e.g. (e.g. DUSER * 100 / DCPU. |
STATE | The thread's runnable state at the time of collection |
NAME | The thread name. Note that when affinitization is enabled and the thread has been affinitized, that affinitization information is append to the thread name.
|
CPU times are reported according to the most appropriate short form of:
Unit | Abbreviation |
---|---|
Days | d |
Hours | h |
Minutes | m |
Seconds | s |
Milliseconds | ms |
Microseconds | us |
Nanoseconds | ns |
Appears in trace output when nv.server.stats.enable=true and nv.server.stats.pool.trace=debug
<27,50360,perf5.neeveresearch.com> 20160628-17:26:03:609 (dbg)... |
Appears in trace output when nv.server.stats.enable=true and nv.server.stats.app.trace=debug.
The below example is output from the Tick To Trade sample application available in GitHub.
[App (ems) Engine Stats] [App (ems) Store Binding Stats] |
The server stats rendering of application stats is more compact, but because stats trace is most often enabled for testing and performance tuning, it is more common to enable engine stats trace rather than server stats trace for application stats. |