|
An operational XVM continuously collects raw statistics during the course of its operation. The xvm can also be configured to spin up a background thread that periodically performs the following:
The raw metrics collected by the server are used by the background statistical thread for its computations and can also be retrieved programmatically by an application for its own use.
In this document, we describe:
Heartbeats for an XVM can be enabled via DDL XML using the <heartbeats> element:
<xvms> <xvm name="my-xvm"> <heartbeats enabled="true" interval="5"> <collectNonZGStats>true</collectNonZGStats> <collectIndividualThreadStats>true</collectIndividualThreadStats> <collectSeriesStats>true</collectSeriesStats> <collectSeriesDatapoints>false</collectSeriesDatapoints> <maxTrackableSeriesValue>100000000</maxTrackableSeriesValue> <includeMessageTypeStats>false</includeMessageTypeStats> <collectPoolStats>true</collectPoolStats> <poolDepletionThreshold>1.0</poolDepletionThreshold> <logging enabled="true"></logging> <tracing enabled="true"></tracing> </heartbeats> </xvm> </xvms> |
Configuration Setting | Default | Description | |
---|---|---|---|
enabled | false | Enable or disable server stats collection and heartbeat emission.
| |
interval | 1000 | The interval in seconds at which server stats will be collected and emitted. | |
collectNonZGStats | true | Some statistics collected by the stats collection thread require creating a small amount of garbage. This can be set to false to supress collection of these stats. | |
collectIndividualThreadStats | true | Indicates whether heartbeats will contains stats for each active thread in the JVM. Individual thread stats are useful | |
collectSeriesStats | true | Indicates whether or not series stats should be included in heartbeats. | |
collectSeriesDatapoints | false | Indicates whether or not series stats should report the data points captured for a series statistic.
| |
maxTrackableSeriesValue | 10 minutes | The maximum value (in microseconds) that can be tracked for reported series histogram timings. Datapoints above this value will be downsampled to this value, but will be reflected in the max value reported in an interval. | |
includeMessageTypeStats | false | Sets whether or not message type stats are included in heartbeats (when enabled for the app). When captureMessageTypeStats is enabled for an app, the AepEngine will record select statistics on a per message type basis. Because inclusion of per message type stats can significantly increase the size of heartbeats, inclusion in heartbeats is disabled by default.
| |
collectPoolStats | true | Indicates whether or not pool stats are collected by the XVM. | |
poolDepletionThreshold | 1.0 | Configuration property used to set the percentage decrement at which a preallocated pool must drop to be included in a server heartbeat. Setting this to a value greater than 100 or less than or equal to 0 disables depletion threshold reporting. This gives monitoring applications advanced warning if it looks like a preallocated pool may soon be exhausted. By default the depletion threshold is set to trigger inclusion in heartbeats at every 1% depletion of the preallocated count. This can be changed by specifying the configuration property nv.server.stats.pool.depletionThreshold to a float value between 0 and 100. For example: Setting this to a value greater than 100 or less than or equal to 0 disables depletion threshold reporting.
| |
logging | See below | Configures binary logging of heartbeats. Binary heartbeat logging provides a means by which heartbeat data can be captured in a zero garbage fashion. Collection of such heartbeats can be useful in diagnosing performance issues in running apps. | |
tracing | See below | Configures trace logging of heartbeats. Enabling textual tracing of heartbeats is a useful way to quickly capture data from server heartbeats for applications that aren't monitoring xvm heartbeats remotely. Textual trace of heartbeats is not zero garbage and is therefore not suitable for applications that are latency sensitive. |
An XVM collects stats that are enabled for the applications that it contains. The followings stats can be enabled and reported in heartbeats
<env> <!-- global stats properties --> <nv> <stats.latencymanager.samplesize>65536</stats.latencymanager.samplesize> <msg.latency.stats>true</msg.latency.stats> <ods.latency.stats>true</ods.latency.stats> <link.network.stampiots>true</link.network.stampiots> </nv> </env> |
Environment Prop | Description |
---|---|
The global default size used for capturing latencies. Latencies stats are collected in a ring buffer which is sampled by the stats thread at each collection interval.
Default value: nv.stats.series.samplesize | |
Property that can be used to control the default sampling size for Series stats.
Default value: 10240 | |
This global property instructs the platform to collect latency statistics for messages passing through various points in the process pipeline.
| |
nv.ods.latency.stats | Globally enables collection of application store latencies. |
nv.link.network.stampiots | Instructs low level socket I/O stamp input/output times on written data.
|
The xvm statistics thread will collect the following latency stats from the apps it contains when they are enabled
<apps> <app name="my-app" ...> <captureTransactionLatencyStats>true</captureTransactionLatencyStats> <captureEventLatencyStats>true</captureEventLatencyStats> <captureMessageTypeStats>false</captureMessageTypeStats> </app> </apps> |
See Also:
When heartbeats are enabled they can be consumed or emitted in several ways, discussed below.
By default all server statistics tracers are disabled as trace logging is not zero garbage and introduces cpu overhead in computing statistics. While tracing heartbeats isn't recommended in production, enabling server statistics trace output can be useful for debugging and performance tuning. To enable you will need to configure the appropriate tracers at the debug level. See the Heartbeat Trace Output section for more detail.
<xvms> <xvmname="my-xvm"> <heartbeats enabled="true" interval="5"> <logging enabled="true"></logging> <tracing enabled="true"></tracing> </heartbeats> </xvm> </xvms> |
Applications that are latency sensitive might prefer to leave all tracers disabled to avoid unnecessary allocations and the associated GC activity. As an alternative, it's possible to enable logging of zero-garbage heartbeat messages to a binary transaction log:
<xvms> <xvm name="my-xvm"> <heartbeats enabled="true" interval="5"> <logging enabled="true"> <storeRoot>/path/to/heartbeat/log/directory</storeRoot> </logging> </heartbeats> </xvm> </xvms> |
When a storeRoot is not set, an XVM will log heartbeats to {XRuntime.getDataDirectory}/server-heartbeats/<xvm-name>-heartbeats.log, which can then be queried and traced from a separate process using the Stats Dump Tool.
Note that at this time binary heartbeat logs do not support rolling collection. Consequently this mechanism is not suitable for long running application instances.
See Also:
Your application can register an event handler for server heartbeats to handle them in process.
@EventHandler public void onHeartbeat(SrvMonHeartbeatMessage message) { // Your logic here: // - You could emit over an SMA message bus. // - log to a time series database. // etc, etc. } |
See the SrvMonHeartbeatMessage JavaDoc for API details.
A Lumino or Robin controller can also be used to connect to a server via a direct admin connection over TCP to listen for heartbeats for monitoring purposes. The XVMs stats thread will queue copies of each emitted heartbeats to each connected admin client.
Heartbeat trace is emitted to the nv.server.heartbeat
logger at a level of INFO
. Trace is only emitted for the types of heartbeat trace for which tracing has been enabled. This section discusses the various types of heartbeat trace, how the trace for those types is enabled and an explanation on the trace output for each of the types.
See Also:
<xvms> <xvm name="my-xvm"> <heartbeats enabled="true" interval="5"> <tracing enabled="true"> <traceSysStats>true</traceSysStats> </tracing> </heartbeats> </xvm> </xvms> |
|
The above trace can be interpreted as follows:
For the entire system:
For the process:
For more info regarding the process statistics above, you can reference the Oracle JavaDoc on MemoryUsage.
JDK 7 or newer is needed to collect all available memory stats. In addition some stats are not available on all jvms.
For each volume available:
Listing of disk system roots required JDK7+, with JDK 6 or below, some disk information may not be available.
Total compilation time
Compare 2 consecutive intervals to determine if JIT occurred in the interval. |
Compare 2 consecutive intervals to determine if a GC occurred in the interval. |
Individual thread stats can be traced by setting the following in DDL:
<xvms> <xvm name="my-xvm"> <heartbeats enabled="true" interval="5"> <collectIndividualThreadStats>true</collectIndividualThreadStats> <tracing enabled="true"> <traceThreadStats>true</traceThreadStats> </tracing> </heartbeats> </xvm> </xvms> |
|
Where columns can be interpreted as:
Column | Description |
---|---|
ID | The thread's id |
CPU | The total amount of time in nanoseconds that the thread has executed (as reported by the JMX thread bean) |
DCPU | The amount of time that the thread has executed in user mode or system mode (as reported by the JMX thread bean) |
DUSER | The amount of time that the thread has executed in user mode in the given interval in nanoseconds (as reported by the JMX thread bean) |
CPU% | The percentage of cpu time the thread used during the interval (e.g. DCPU * 100 / interval time) |
USER% | The percentage of user mode cpu time the thread used during the interval (e.g. DCPU * 100 / interval time) |
WAIT% | The percentage of the time that the thread was recorded in a wait state such as a busy spin loop or a disruptor wait. Wait times are proactively captured by the platform via code instrumentation that takes a timestamp before and after entering/exiting the wait condition. This means that unlike CPU% or USER%, this percentage can include time when the thread is not using scheduled and consuming cpu resources. Because of this It is not generally possible to simply subtract WAIT% from CPU% to calculate the amount of time the thread actually executed. For example if CPU% is 50 and WAIT% is also 50 and the interval is 5 seconds, it could be that 2.5 seconds of real work was done while 2.5 seconds of wait time occurred while the thread was context switched out, or it could be that all 2.5 seconds of wait time coincided with the 2.5 seconds of of cpu time and all of the cpu time was spent busy spinning. In other words, WAIT% gives a definitive indication of time that the thread was not doing active work during the interval, the remaining cpu time is at the mercy of the operating systems thread scheduler. |
STATE | The thread's runnable state at the time of collection |
NAME | The thread name. Note that when affinitization is enabled and the thread has been affinitized, that affinitization information is append to the thread name.
|
affinity | ![]() |
CPU times are reported according to the most appropriate short form of:
Unit | Abbreviation |
---|---|
Days | d |
Hours | h |
Minutes | m |
Seconds | s |
Milliseconds | ms |
Microseconds | us |
Nanoseconds | ns |
Pools stats can be traced by setting the following in DDL:
<xvms> <xvm name="my-xvm"> <heartbeats enabled="true" interval="5"> <collectPoolStats>true</collectPoolStats> <tracing enabled="true"> <tracePoolStats>true</tracePoolStats> </tracing> </heartbeats> </xvm> </xvms> |
To reduce the size of heartbeats, Pool Stats for a given pool are only included when:
|
Stat | Description |
---|---|
PUT | The overall number of times items were put (returned) to a pool. |
DPUT | The number of times items were put (returned) to a pool since the last time the pool was reported in a heartbeat (the delta). |
GET | The overall number of times an item was taken from a pool.
|
DGET | The number of times an item was taken from a pool since the last time the pool was reported in a heartbeat (the delta). |
HIT | The overall number of times that an item taken from a pool was satisfied by there being an available item in the pool. |
DHIT | The number of times that an item taken from a pool was satisfied by there being an available item in the pool since the last time the pool was reported in a heartbeat(the delta). |
MISS | The overall number of times that an item taken from a pool was not satisfied by there being an available item in the pool resulting in an allocation. |
DMISS | The number of times that an item taken from a pool was not satisfied by there being an available item in the pool resulting in an allocation since the last time the pool was reported in a heartbeat. |
GROW | The overall number of times the capacity of a pool had to be increased to accomodate returned items. |
DGROW | The number of times the capacity of a pool had to be increased to accomodate returned items since the last time the pool was reported in a heartbeat. |
EVIC | The overall number of items that were evicted from the pool because the pool did not have an adequate capactiy to store them. |
DEVIC | The overall number of items that were evicted from the pool because the pool did not have an adequate capactiy to store them since the last time the pool was reported in a heartbeat. |
DWSH | The overall number of times that an item return to the pool was washed (e.g. fields reset) in the detached pool washer thread. |
DDWSH | The number of times that an item return to the pool was washed (e.g. fields reset) in the detached pool washer thread since the last time the pool was reported in a heartbeat |
SIZE | The number of items that are currently in the pool available for pool gets. This number will be 0 if all objects that have been allocated by the pool have been taken.
|
PRE | The number of items initially preallocated for the pool. |
CAP | The capacity of the backing array that is allocated to hold available pool items that have been preallocated or returned to the pool.
|
NAME | The unique identifier for the pool. |
Stats collected by the AEP engine underlying your application are also included in heartbeats. Tracing of engine stats can be enabled with the following.
<xvms> <xvm name="my-xvm"> <heartbeats enabled="true" interval="5"> <tracing enabled="true"> <traceAppStats>true</traceAppStats> </tracing> </heartbeats> </xvm> </xvms> |
See AEP Engine Statistics for more detail about engine stats.
User stats collected by your application are also included in heartbeats. Tracing of user stats can be enabled with the following.
<xvms> <xvm name="my-xvm"> <heartbeats enabled="true" interval="5"> <tracing enabled="true"> <traceUserStats>true</traceUserStats> </tracing> </heartbeats> </xvm> </xvms> |
|
See Also: