In This Section

Overview

An operational XVM continuously collects raw statistics during the course of its operation. The xvm can also be configured to spin up a background thread that periodically performs the following:

Performs higher level statistical computations such as calculate message rates and average latencies.
Emits heartbeat messages to be processed by handlers.
Optionally outputs server statistics to named trace loggers using the platform's tracing and logging system.
Optionally writes heartbeat messages containing useful server-wide statistics to a binary transaction log (with zero steady-state allocations).
Detects violation/abatement of alert thresholds and dispatch appropriate alert events.

The raw metrics collected by the server are used by the background statistical thread for its computations and can also be retrieved programmatically by an application for its own use.
In this document, we describe:

the various raw metrics captured by the server,
how to switch on and configure the statistics thread operation,
the higher level statistics calculations performed by the statistics thread,
and the format of the output of the statistics thread.

Configuration Settings

Configuration Setting	Default	Description
nv.server.stats.enable	false	Enable or disable server stats collection and heartbeat emission. If tracers have been configured at the debug level, server statistics will be traced on the same thread that performs stats collection and emits heartbeats.
nv.server.stats.interval	1000	The interval (in milliseconds) at which server stats will be collected and heartbeat events emitted when nv.server.stats.enable=true.
nv.server.stats.includeSeries	true	Indicates whether or not series stats should be included in heartbeats.
nv.server.stats.includeSeriesDataPoints	false	Indicates whether or not series stats should report the data points captured for a series statistic.
nv.server.stats.maxTrackableSeriesValue	10 minutes	The maximum value (in microseconds) that can be tracked for reported series histogram timings.
nv.server.stats.pool.enable	true	Indicates whether or not pool stats are collected by the server when nv.server.stats.enable=true.
nv.server.stats.pool.depletionThreshold	1.0	Configuration property used to set the percentage decrement at which a preallocated pool must drop to be included in a server heartbeat. Setting this to a value greater than 100 or less than or equal to 0 disables depletion threshold reporting. This gives monitoring applications advanced warning if it looks like a preallocated pool may soon be exhausted. By default the depletion threshold is set to trigger inclusion in heartbeats at every 1% depletion of the preallocated count. This can be changed by specifying the configuration property nv.server.stats.pool.depletionThreshold to a float value between 0 and 100. For example: If a pool is preallocated with 1000 items and this property is set to 10, pool stats will be emitted for the pool each time a heartbeat occur and the pool has dropped below a 10% threshold of the preallocated size e.g. at 900, 800, 700, until its size reaches 0 (at which point subsequent misses would cause it to be included on every heartbeat). Setting this to a value greater than 100 or less than or equal to 0 disables depletion threshold reporting.

Handling Server Heartbeats

When heartbeats are enabled via the nv.server.stats.enable and nv.server.stats.interval properties they can be handled in several ways, discussed below.

Tracing Heartbeats

By default all server statistics tracers are disabled as trace logging is not zero garbage and introduces cpu overhead in computing statistics. While tracing heartbeats isn't recommended in production, enabling server statistics trace output can be useful for debugging and performance tuning. To enable you will need to configure the appropriate tracers at the debug level. See the Output Trace Loggers section for more detail.

Binary Heartbeat Logging

Applications that are latency sensitive might prefer to leave all tracers disabled to avoid unnecessary allocations and the associated GC activity. As an alternative, it's possible to enable logging of zero-garbage heartbeat messages to a binary transaction log:

Heartbeat logs can be queried out of process using the Stats Dump Tool.

Note that at this time binary heartbeat logs do not support rolling collection, so at this time the built-in heartbeat logging mechanism is not suitable for long running application instances.

Heartbeat Event Handlers

Your application can register an event handler for server heartbeats to handle them in process.

See the SrvMonHeartbeatMessage JavaDoc for API details.

Robin

A Robin Controller can also be used to connect to a server and listen for heartbeats emitted by a Talon Server.

The Statistics Thread

The server statistics thread performs the following operations:

Compute higher level statistics such as metric averages.
Output raw and computed statistics to any trace loggers configured at the debug level.
Emit heartbeat messages consumed by handlers and the heartbeat log, if enabled.

The statistics thread can be enabled or disabled via the configuration parameters.

Administrative Control

The server statistics thread can also be started via the following environment variables or system properties:

Once started administratively, the statistics thread remains active until the server is stopped.

Note: When configuring using environment variables, in Unix based systems where the shell does not support "." in environment variables. If you would like to use environment variables to set server statistics config on Unix, replace any "." characters in the variable name with "_".

Pool Stats

By default, the server statistics thread collects object pool stats. If you would prefer pool stats not to be recorded:

Computed Statistics

The server statistics thread computes (then optionally traces and/or logs) the following at the configured frequency:

General Info

Date and time that statistics gathering started
Server name
Server PID
Number of apps running in the server
Time spent gathering server statistics (for the current interval, excluding formatting)

System Info

Number of available processors
System load average

Memory Info

Initial heap size
Heap used
Heap committed
Max heap size
Initial non-heap size
Non-heap memory used
Non-heap memory committed
Non-heap memory max size

Thread Info

Total thread count
Daemon thread count
Peak thread count

JIT Info

JIT name
Total compilation time

GC Info

Collection count (for all GCs)
Collection time (for all GCs)

Thread Stats

Reported for each active thread:

Per-thread CPU time as a percentage of total CPU time
Per-thread user CPU time as a percentage of thread CPU time
Delta CPU time
Delta CPU time as a percentage of total delta CPU time
Delta user CPU time
Delta user CPU time as a percentage of thread delta CPU time

Pool Stats

To reduce the size of heartbeats, Pool Stats for a given pool are only included when:

A miss has been recorded for the pool in a given interval and it results in a new object being allocated.
The number of preallocated obects taken from a pool drops below the configured value for the pool depletion threshold.

Pool stats include:

Number of puts
Delta number of puts
Number of gets
Delta number of gets
Number of hits
Delta number of hits
Number of misses
Delta number of misses
Number of growths
Delta number of growths
Number of evicts
Delta number of evicts
Number of detached watches
Delta number of detached watches
Pool size
Number of preallocated objects (????)
Pool capacity
Pool key

App Stats

Reported for each app:

Engine stats
Store stats
Store binding persister stats
Store binding ICR sender stats
Bus binding stats
User stats

Trace Output Loggers

The server statistics thread uses several named trace loggers to log the raw and computed statistics output. The following is the list of the loggers used by the server stats thread.

nv.server.stats.sys.trace – tracer for System Statistics
nv.server.stats.thread.trace – tracer for Thread Statistics
nv.server.stats.pool.trace – tracer for Pool Statistics
nv.server.stats.app.trace – tracer for Application and Engine Statistics
nv.server.stats.userstats.trace – tracer for User Defined Statistics

If you would like to see server statistics trace output, the appropriate tracers need to be configured at the debug level. For example, to enable System Stats trace output:

nv.server.stats.sys.trace=debug

See Also: X Platform Tracing and Logging for general details on configuration of trace logging.

Output Format

System Stats

Appears in trace output when nv.server.stats.enable=true and nv.server.stats.sys.trace=debug

[System Stats]
Tue Jun 28 17:26:03 PDT 2016 'ems1' server (pid=50360) 2 apps (collection time=73982839 ns)
System: 20 processors load average: 2.30
Memory: HEAP 8.0G init 6.6G used 7.8G commit 7.8G max NON-HEAP 2M init 44M used 46M commit 0K max
Threads: 26 total (23 daemon) 27 peak
JIT: HotSpot 64-Bit Tiered Compilers, time: 11997 ms
[6 6404 ms] [2 8771 ms]

Thread Stats

Appears in trace output when nv.server.stats.enable=true and nv.server.stats.thread.trace=debug

[Thread Stats]
ID CPU CPU% USER% DCPU DCPU% DUSER% STATE NAME
1 0 1 100 0 0 0 RUNNABLE X-Server-ems1-Main (aff=[])
2 0 1 100 0 0 0 WAITING Reference Handler
3 0 1 100 0 0 0 WAITING Finalizer
4 0 1 100 0 0 0 RUNNABLE Signal Dispatcher
23 0 3 100 0 0 0 RUNNABLE X-EDP-McastReceiver (aff=[1(s0c1t0)])
25 0 3 100 0 0 0 TIMED_WAITING X-EDP-Timer (aff=[1(s0c1t0)])
26 0 4 100 0 0 0 RUNNABLE X-ODS-StoreLog-ems1-heartbeats-1 (aff=[])
27 0 4 100 0 0 0 RUNNABLE X-Server-ems1-StatsRunner (aff=[1(s0c1t0)])
30 0 4 100 0 0 0 TIMED_WAITING Wrapper-Control-Event-Monitor
31 0 4 100 0 0 0 TIMED_WAITING Wrapper-Startup-Runner
33 0 4 100 0 0 0 RUNNABLE X-Server-ems1-IOThread-1 (aff=[8(s0c11t0)])
35 0 5 100 0 0 0 TIMED_WAITING X-EventMultiplexer-Wakeup (aff=[1(s0c1t0)])
36 0 5 100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-admin.admin (aff=[1(s0c1t0)])
37 0 5 100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-admin.client-fb0ab760-3d8f-11e6-abfe-0002c9faaac2 (aff=[1(s0c1t0)]) 
38 0 5 100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-admin.control-fb0a1b20-3d8f-11e6-abfe-0002c9faaac2 (aff=[1(s0c1t0)]) 
40 0 5 100 0 0 0 TIMED_WAITING X-EventMultiplexer-Wakeup (aff=[1(s0c1t0)])
41 0 5 100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-ems.market (aff=[1(s0c1t0)])
42 0 5 100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-ems.ems (aff=[1(s0c1t0)])
43 0 6 100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-ems.client-fb48aac0-3d8f-11e6-abfe-0002c9faaac2 (aff=[1(s0c1t0)]) 
44 0 6 100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-ems.control-fb0a1b20-3d8f-11e6-abfe-0002c9faaac2 (aff=[1(s0c1t0)]) 
45 0 6 100 0 0 0 WAITING X-STEMux-admin-1 (aff=[])
46 0 6 100 0 0 0 RUNNABLE X-STEMux-ems-2 (aff=[2(s0c2t0)])
47 0 6 100 0 0 0 RUNNABLE X-ODS-StoreLinkAcceptor-1 (aff=[1(s0c1t0)])
48 0 6 100 0 0 0 RUNNABLE X-ODS-StoreEventMultiplexer-1 (aff=[3(s0c3t0)])
49 0 6 100 0 0 0 RUNNABLE X-AEP-BusManager-IO-ems.market (aff=[6(s0c9t0)])
50 0 6 100 0 0 0 RUNNABLE X-Client-LinkManagerReader[298b29db-82ae-4cc7-97b7-1143a417e86c] (aff=[7(s0c10t0)])

Pool Stats

Appears in trace output when nv.server.stats.enable=true and nv.server.stats.pool.trace=debug

<27,50360,perf5.neeveresearch.com> 20160628-17:26:03:609 (dbg)...
[Pool Stats]
PUT DPUT GET DGET HIT DHIT MISS DMISS GROW DGROW EVIC DEVIC DWSH DDWSH SIZE PRE CAP NAME
0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1024 packet.server_client_connect_request.71
0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1024 packet.server_client_connect_reply.50
46 0 46 0 44 0 2 0 0 0 0 0 0 0 2 0 1024 iobuf.heap-256.4
4 0 145 0 4 0 141 0 0 0 0 0 0 0 0 0 1024 iobuf.native-256.23
49 0 64 0 48 0 16 0 0 0 0 0 0 0 1 0 1024 iobuf.heap-64.2
18 0 27.3M 0 18 0 27.3M 0 0 0 0 0 0 0 0 0 1024 iobuf.native-32.20
6 0 29 0 6 0 23 0 0 0 0 0 0 0 0 0 1024 iobuf.native-64.21
0 0 5 0 0 0 5 0 0 0 0 0 0 0 0 0 1024 packet.recovery_log_eof_marker.87
0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 1024 packet.recovery_log_checkpoint_state.39
48 0 48 0 46 0 2 0 0 0 0 0 0 0 2 0 1024 packet.discovery_esa.65
2 0 33 0 2 0 31 0 0 0 0 0 0 0 0 0 1024 iobuf.heap-32.1
0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 1024 iobuf.native-8K.28
0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 1024 xbuf.entity.SrvMonAppInfoXbufEntity.0.100.106

App Stats and User Stats

Appears in trace output when nv.server.stats.enable=true and nv.server.stats.app.trace=debug.

The below example is output from the Tick To Trade sample application available in GitHub.

[App (ems) Engine Stats]
MRCV RATE DRATE MRCVB RATE DRATE MRCVG RATE DRATE MSND RATE DRATE MSNDB RATE DRATE MSNDG RATE DRATE EVNT RATE DRATE FEVNT RATE DRATE TXN RATE DRATE TXNSZ RLBK
106K 613 92 106K 613 92 0 0 0 106K 613 92 106K 613 92 0 0 0 320K 1.8K 278 106K 613 92 106K 613 92 0 0
...Disruptor {[0of1,024] 0% (MultiThreadedSufficientCores, BusySpin)}
...Feeder Queues{
......X-Server-ems1-Main (aff=[]): size=0, decongestCount=284178
......X-ODS-StoreEventMultiplexer-1 (aff=[3(s0c3t0)]): size=0, decongestCount=284164
......X-Server-ems1-IOThread-1 (aff=[8(s0c11t0)]): size=0, decongestCount=284162
......X-AEP-BusManager-IO-ems.market (aff=[6(s0c9t0)]): size=0, decongestCount=284160
......X-STEMux-ems-2 (aff=[2(s0c2t0)]): size=0, decongestCount=284160
......X-Client-LinkManagerReader[298b29db-82ae-4cc7-97b7-1143a417e86c] (aff=[7(s0c10t0)]): size=0, decongestCount=284159
...}
[App (ems) User Stats]
...Gauges{
......EMS Messages Received: 142604
......EMS Orders Received: 35651
...}
...Series{
......[In Proc Tick To Trade(sno=35651, #points=150, #skipped=0)
.........In Proc Tick To Trade(interval): [sample=150, min=72 max=84 mean=75 median=75 75%ile=77 90%ile=79 99%ile=83 99.9%ile=84 99.99%ile=84]
.........In Proc Tick To Trade (running): [sample=35651, min=72 max=20008 mean=93 median=76 75%ile=82 90%ile=111 99%ile=227 99.9%ile=805 99.99%ile=11975]
......[In Proc Time To First Slice(sno=35651, #points=150, #skipped=0)
.........In Proc Time To First Slice(interval): [sample=150, min=85 max=98 mean=88 median=88 75%ile=90 90%ile=92 99%ile=95 99.9%ile=98 99.99%ile=98]
.........In Proc Time To First Slice (running): [sample=35651, min=84 max=44695 mean=249 median=88 75%ile=95 90%ile=133 99%ile=283 99.9%ile=36287 99.99%ile=41439]
...}

[App (ems) Store Binding Stats]
CSND RATE DRATE CESND RATE DRATE CRCV RATE DRATE CERCV RATE DRATE CCSND RATE DRATE CCRCV RATE DRATE SIZE
106K 613 88 106K 613 88 0 0 0 0 0 0 0 0 0 106K 613 88 0
[App (ems) Store Binding Persister Stats]
<?>
...no persister
[App (ems) Store Binding ICR Sender Stats]
<?>
...no ICR sender
[App (ems) Bus Binding (market)]
<Detached ,Disruptor {[0of1,024] 0% (SingleThreaded, BusySpin)}>
MRCV RATE DRATE MRCVI RATE DRATE MRCVB RATE DRATE MENQ RATE DRATE MSND RATE DRATE FLS RATE DRATE MFLS RATE DRATE FLA RATE DRATE MFLA RATE DRATE
71302 408 57 0 0 0 0 0 0 35651 204 28 35651 204 28 71302 408 57 35651 204 28 0 0 0 0 0 0
STB RATE DRATE STBIB RATE DRATE STBB RATE DRATE PRCV RATE DRATE COM RATE DRATE RLBK CLNTS CHNLS FAILS
0 0 0 0 0 0 0 0 0 0 0 0 35651 204 28 0 0 2 0
[App (ems) Bus Binding (control-fb0a1b20-3d8f-11e6-abfe-0002c9faaac2)]
<Attached>
MRCV RATE DRATE MRCVI RATE DRATE MRCVB RATE DRATE MENQ RATE DRATE MSND RATE DRATE FLS RATE DRATE MFLS RATE DRATE FLA RATE DRATE MFLA RATE DRATE
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
STB RATE DRATE STBIB RATE DRATE STBB RATE DRATE PRCV RATE DRATE COM RATE DRATE RLBK CLNTS CHNLS FAILS
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
[App (ems) Bus Binding (ems)]
<Attached>
MRCV RATE DRATE MRCVI RATE DRATE MRCVB RATE DRATE MENQ RATE DRATE MSND RATE DRATE FLS RATE DRATE MFLS RATE DRATE FLA RATE DRATE MFLA RATE DRATE
35651 204 28 0 0 0 0 0 0 71302 408 56 71302 408 56 71302 408 56 71302 408 56 0 0 0 0 0 0
STB RATE DRATE STBIB RATE DRATE STBB RATE DRATE PRCV RATE DRATE COM RATE DRATE RLBK CLNTS CHNLS FAILS
0 0 0 0 0 0 0 0 0 0 0 0 71302 408 56 0 0 2 0

The server stats rendering of application stats is more compact, but because stats trace is most often enabled for testing and performance tuning, it is more common to enable engine stats trace rather than server stats trace for application stats.

The Talon Manual

Additional Links

Overview

Configuration Settings

Handling Server Heartbeats

Tracing Heartbeats

Binary Heartbeat Logging

Heartbeat Event Handlers

Robin

The Statistics Thread

Administrative Control

Pool Stats

Computed Statistics

General Info

System Info

Memory Info

Thread Info

JIT Info

GC Info

Thread Stats

Pool Stats

App Stats

Trace Output Loggers

Output Format

System Stats

Thread Stats

Pool Stats

App Stats and User Stats

The Talon Manual

Additional Links

XVM System Stats

Overview

Configuration Settings

Handling Server Heartbeats

Tracing Heartbeats

Binary Heartbeat Logging

Heartbeat Event Handlers

Robin

The Statistics Thread

Administrative Control

Pool Stats

Computed Statistics

General Info

System Info

Memory Info

Thread Info

JIT Info

GC Info

Thread Stats

Pool Stats

App Stats

Trace Output Loggers

Output Format

System Stats

Thread Stats

Pool Stats

App Stats and User Stats