div

id	toc
class	toc

div

class	tocTitle

In This Section

div

class	tocContents

Table of Contents

maxLevel	2
indent	8px
style	none

Anchor
_Toc221568551
_Toc221568551
Overview

An operational XVM continuously collects raw statistics during the course of its operation. The xvm can also be configured to spin up a background thread that periodically performs the following:

Performs higher level statistical computations such as calculate calculating message rates and average latencies.
Emits heartbeat messages to be processed by handlers.
Optionally outputs server statistics rendered stats to named trace loggers using the platform's tracing and logging system.a trace logger which is useful in testing and diagnose situations.
Optionally writes heartbeat messages containing useful server-wide statistics to a binary transaction log (with zero steady-state allocations) .Detects violation/abatement of alert thresholds and dispatch appropriate alert events.which is useful for zero garbage capture of performance in production.

The raw metrics collected by the server are used by the background statistical thread for its computations and can also be retrieved programmatically by an application for its own use.
In this document, we describe:

the various raw metrics captured by the server,
how to switch on and configure the statistics thread operation,how enable and configure xvm stats collection and emission.
the higher level statistics calculations performed by the statistics thread,
and the format of the output of the statistics thread.

Configuration Settings

Enabling Heartbeats

Heartbeats for an XVM can be enabled via DDL XML using the <heartbeats> element:

Code Block

	xml
	xml

<xvms>
  <xvm name="my-xvm">
    <heartbeats enabled="true" interval="5">
      <collectNonZGStats>true</collectNonZGStats>
      <collectIndividualThreadStats>true</collectIndividualThreadStats>
      <collectSeriesStats>true</collectSeriesStats>
      <collectSeriesDatapoints>false</collectSeriesDatapoints>
      <maxTrackableSeriesValue>100000000</maxTrackableSeriesValue>
      <includeMessageTypeStats>false</includeMessageTypeStats>
      <collectPoolStats>true</collectPoolStats>
      <poolDepletionThreshold>1.0</poolDepletionThreshold>
      <logging enabled="true"></logging>
      <tracing enabled="true"></tracing>
    </heartbeats>
  </xvm>
</xvms>

nv.server.stats.

Configuration Setting

Default

Description

nv.server.stats.enable

enabled

false

Enable or disable server stats collection and heartbeat emission. If tracers have been configured at the debug level, server statistics will be traced on the same thread that performs stats collection and emits heartbeats.

Note
Collection of stats and emission of heartbeats can impact application performance from both a latency and throughput standpoint. For applications that are particularly sensitive to performance, it is a good idea to compare performance with and without heartbeats enabled to understand the overhead that is incurred by enabling heartbeats.

interval

1000

The interval ( in milliseconds) seconds at which server stats will be collected and heartbeat events emitted when nv.server.stats.enable=true.

nv.server.stats.includeSeries

.

collectNonZGStats

true

Some statistics collected by the stats collection thread require creating a small amount of garbage. This can be set to false to supress collection of these stats.

collectIndividualThreadStats

true

Indicates whether heartbeats will contains stats for each active thread in the JVM. Individual thread stats are useful

collectSeriesStats

true

Indicates whether or not series stats should be included in heartbeats.

nv.server.stats.includeSeriesDataPoints

collectSeriesDatapoints

false

Indicates whether or not series stats should report the data points captured for a series statistic.

nv.server.stats.

Enabling this value includes each datapoint collected in a series in heartbeats which can make emitted heartbeats very large and slow down their collection. It is not recommended that this be run in production.

maxTrackableSeriesValue

10 minutes

The maximum value (in microseconds) that can be tracked for reported series histogram timings.

nv.server.stats.pool.enable

Datapoints above this value will be downsampled to this value, but will be reflected in the max value reported in an interval.

includeMessageTypeStats

false

Sets whether or not message type stats are included in heartbeats (when enabled for the app).

When captureMessageTypeStats is enabled for an app, the AepEngine will record select statistics on a per message type basis. Because inclusion of per message type stats can significantly increase the size of heartbeats, inclusion in heartbeats is disabled by default.

For message type stats to be included in heartbeats, both captureMessageTypeStats for the app must be set to true (capture is disabled by default because recording them is costly), and includeMessageTypeStats must be set to true (inclusion is disabled by default because emitting them is costly).

collectPoolStats

true

Indicates whether or not pool stats are collected by the server when nv.server.stats.enable=true.nv.server.stats.pool.depletionThresholdXVM.

poolDepletionThreshold

Anchor

	PoolDepletionThreshold
	PoolDepletionThreshold

1.0

Configuration property used to set the percentage decrement at which a preallocated pool must drop to be included in a server heartbeat. Setting this to a value greater than 100 or less than or equal to 0 disables depletion threshold reporting.

This gives monitoring applications advanced warning if it looks like a preallocated pool may soon be exhausted. By default the depletion threshold is set to trigger inclusion in heartbeats at every 1% depletion of the preallocated count. This can be changed by specifying the configuration property nv.server.stats.pool.depletionThreshold to a float value between 0 and 100.

For example:
If a pool is preallocated with 1000 items and this property is set to 10, pool stats will be emitted for the pool each time a heartbeat occur and the pool has dropped below a 10% threshold of the preallocated size e.g. at 900, 800, 700, until its size reaches 0 (at which point subsequent misses would cause it to be included on every heartbeat).

Setting this to a value greater than 100 or less than or equal to 0 disables depletion threshold reporting.

Handling Server Heartbeats

...

logging

See below

Configures binary logging of heartbeats.

Binary heartbeat logging provides a means by which heartbeat data can be captured in a zero garbage fashion. Collection of such heartbeats can be useful in diagnosing performance issues in running apps.

tracing

See below

Configures trace logging of heartbeats.

Enabling textual tracing of heartbeats is a useful way to quickly capture data from server heartbeats for applications that aren't monitoring xvm heartbeats remotely. Textual trace of heartbeats is not zero garbage and is therefore not suitable for applications that are latency sensitive.

Enabling Global Stats

An XVM collects stats that are enabled for the applications that it contains. The followings stats can be enabled and reported in heartbeats

Environment Properties

Code Block

	xml
	xml

<env>
  <!-- global stats properties -->
  <nv>
    <stats.latencymanager.samplesize>65536</stats.latencymanager.samplesize>
    <msg.latency.stats>true</msg.latency.stats>
    <ods.latency.stats>true</ods.latency.stats>
    <link.network.stampiots>true</link.network.stampiots>
  </nv>
</env>

Environment Prop	Description
nv.stats.latencymanager.samplesize	The global default size used for capturing latencies. Latencies stats are collected in a ring buffer which is sampled by the stats thread at each collection interval. This should be sized large enough such that datapoints aren't missed, but not so large that it adversely affects processor cache performance. Default value: nv.stats.series.samplesize
nv.stats.series.samplesize	Property that can be used to control the default sampling size for Series stats. If the number of datapoints collected in a stats interval exceeds this size the computation for histographical data will be lossy, increasing the value will reduce loss of datapoints, but results in greater overhead in stats collection in terms of both memory usage and pressure on the process caches. Default value: 10240
nv.msg.latency.stats	This global property instructs the platform to collect latency statistics for messages passing through various points in the process pipeline. when not enabled many latency stats will not be available in heartbeats.
nv.ods.latency.stats	Globally enables collection of application store latencies.
nv.link.network.stampiots	Instructs low level socket I/O stamp input/output times on written data. This should be enabled to capture store latencies based on wire time, or for certain latencies in the direct message bus binding.

Engine Stats

The xvm statistics thread will collect the following latency stats from the apps it contains when they are enabled

Code Block

	xml
	xml

<apps>
  <app name="my-app" ...>
    <captureTransactionLatencyStats>true</captureTransactionLatencyStats>
    <captureEventLatencyStats>true</captureEventLatencyStats>
    <captureMessageTypeStats>false</captureMessageTypeStats>
  </app>
</apps>

See Also:

DDL Reference for more details about the above stats.
AEP Engine Statistics for more detail about app-level stats.

Handling XVM Heartbeats

When heartbeats are enabled they can be consumed or emitted in several ways, discussed below.

Heartbeat Tracing

...

By default all server statistics tracers are disabled as trace logging is not zero garbage and introduces cpu overhead in computing statistics. While tracing heartbeats isn't recommended in production, enabling server statistics trace output can be useful for debugging and performance tuning. To enable you will need to configure the appropriate tracers at the debug level. See the Heartbeat Trace Output Trace Loggers section for more detail.

...

Code Block

language	xml

<xvms>
  <xvmname="my-xvm">
    <heartbeats enabled="true" interval="5">
      <logging enabled="true"></logging>
      <tracing enabled="true"></tracing>
    </heartbeats>
  </xvm>
</xvms>

Heartbeat Logging

Applications that are latency sensitive might prefer to leave all tracers disabled to avoid unnecessary allocations and the associated GC activity. As an alternative, it's possible to enable logging of zero-garbage heartbeat messages to a binary transaction log:

Code Block

language	xml

<?xml version="1.0" encoding="utf-8"?>
<model xmlns="http://www.neeveresearch.com/schema/x-ddl" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"<xvms>
  <xvm name="my-xvm">
    <!-- ... snip ... -->
    <servers>
        <!-- ... -->
        <server name="..." group="...<heartbeats enabled="true" interval="5">
            <!-- ... -->
            <heartbeatLogging <logging enabled="true">
                <storeRoot>/path/to/heartbeat/log/directory</storeRoot>
            </heartbeatLogging>logging>
            <!-- ... -->
        </server>
        <!-- ... -->
    </servers>
    <!-- ... snip ... -->
</model>

...

</heartbeats>
  </xvm>
</xvms>

When a storeRoot is not set, an XVM will log heartbeats to {XRuntime.getDataDirectory}/server-heartbeats/<xvm-name>-heartbeats.log, which can then be queried and traced from a separate process using the Stats Dump Tool.

Note that at this time binary heartbeat logs do not support rolling collection, so at this time the built-in heartbeat logging . Consequently this mechanism is not suitable for long running application instances.

See Also:

DDL Reference for more options that can be used to configure heartbeat logging.

Heartbeat Event Handlers

Your application can register an event handler for server heartbeats to handle them in process.

...

See the SrvMonHeartbeatMessage JavaDoc for API details.

...

Admin Clients

A Lumino or Robin Controller controller can also be used to connect to a server and via a direct admin connection over TCP to listen for heartbeats emitted by a Talon Serverfor monitoring purposes. The XVMs stats thread will queue copies of each emitted heartbeats to each connected admin client.

Anchor
_

...

Toc221568564
_

...

The server statistics thread performs the following operations:

Compute higher level statistics such as metric averages.
Output raw and computed statistics to any trace loggers configured at the debug level.
Emit heartbeat messages consumed by handlers and the heartbeat log, if enabled.

The statistics thread can be enabled or disabled via the configuration parameters.

...

The server statistics thread can also be started via the following environment variables or system properties:

Code Block

language	none

nv.server.stats.enable=true

Once started administratively, the statistics thread remains active until the server is stopped.

Note
Note: When configuring using environment variables, in Unix based systems where the shell does not support "." in environment variables. If you would like to use environment variables to set server statistics config on Unix, replace any "." characters in the variable name with "_".

...

By default, the server statistics thread collects object pool stats. If you would prefer pool stats not to be recorded:

Code Block

language	none

nv.server.stats.pool.enable=false

...

The server statistics thread computes (then optionally traces and/or logs) the following at the configured frequency:

...

Toc221568564
Heartbeat Trace Output

Heartbeat trace is emitted to the nv.server.heartbeat logger at a level of INFO. Trace is only emitted for the types of heartbeat trace for which tracing has been enabled. This section discusses the various types of heartbeat trace, how the trace for those types is enabled and an explanation on the trace output for each of the types.

See Also:

X Platform Tracing and Logging for general information on trace logging.

System Stats

Code Block

	xml
	xml

<xvms>
  <xvm name="my-xvm">
    <heartbeats enabled="true" interval="5">
      <tracing enabled="true">
        <traceSysStats>true</traceSysStats>
      </tracing>
    </heartbeats>
  </xvm>
</xvms>

Sample Trace Output

div

style	font-size: smaller

No Format

[System Stats]
Sat May 13 12:14:03 PDT 2017 'market' server (pid=54449) 2 apps (collection time=0 ns)
System: 20 processors, load average: 0.73 (load 0.10 process, 0.10 total system)
Memory (system): 94.4G total, 89.8G free, 5.5G committed (Swap: 96.6G total, 96.6G free)
Memory (proc): HEAP 1.5G init, 522M used, 1.5G commit, 1.5G max NON-HEAP 2M init, 47M used, 48M commit, 0K max
Disk:
  [/ Total: 49.2GB, Usable:  18GB, Free:  18GB]
  [/dev/shm Total: 47.2GB, Usable: 47.2GB, Free: 47.2GB]
  [/boot Total: 484.2MB, Usable: 422.4MB, Free: 422.4MB]
  [/home Total: 405.2GB, Usable: 267GB, Free: 267GB]
  [/distributions Total: 196.9GB, Usable: 8.1GB, Free: 8.1GB]
Threads: 20 total (16 daemon) 21 peak
JIT: HotSpot 64-Bit Tiered Compilers, time: 2959 ms
GC:
...ParNew [0 collections, commulative time: 0 ms]
...MarkSweepCompact [1 collections, commulative time: 54 ms]

The above trace can be interpreted as follows:

General Info

Date and time that statistics gathering started
Server name
Server PID
Number of apps running in the server
Time spent gathering server statistics (for the current interval, excluding formatting)

System Info

Number of available processors
System load average

Memory Info

For the entire system:

Total available memory
The free memory
Commit memory
Swap total/free

For the process:

Initial heap size
Heap used
Heap committed
Max heap size
Initial non-heap size
Non-heap memory used
Non-heap memory committed
Non-heap memory max size

For more info regarding the process statistics above, you can reference the Oracle JavaDoc on MemoryUsage.

JDK 7 or newer is needed to collect all available memory stats. In addition some stats are not available on all jvms.

Disk

For each volume available:

Total space
Usable space
Available space.

Listing of disk system roots required JDK7+, with JDK 6 or below, some disk information may not be available.

Thread Info

Total thread count
Daemon thread count
Peak thread count

JIT Info

JIT name
Total compilation time

Tip
Compare 2 consecutive intervals to determine if JIT occurred in the interval.

GC Info

Collection count (for all GCs)
Collection time (for all GCs)

Thread Stats

Reported for each active thread:

Per-thread CPU time as a percentage of total CPU time
Per-thread user CPU time as a percentage of thread CPU time
Delta CPU time
Delta CPU time as a percentage of total delta CPU time
Delta user CPU time
Delta user CPU time as a percentage of thread delta CPU time

Pool Stats

To reduce the size of heartbeats, Pool Stats for a given pool are only included when:

A miss has been recorded for the pool in a given interval and it results in a new object being allocated.
The number of preallocated obects taken from a pool drops below the configured value for the pool depletion threshold.

Pool stats include:

Stat	Description
Number of puts	The overall number of times items were put (returned) to a pool.
Delta number of puts	The number of times items were put (returned) to a pool since the last time the pool was reported in a heartbeat
Number of gets	The overall number of times an item was taken from a pool.
Delta number of gets	The number of times an item was taken from a pool since the last time the pool was reported in a heartbeat.
Number of hits	The overall number of times that an item taken from a pool was satisfied by there being an available item in the pool.
Delta number of hits	The number of times that an item taken from a pool was satisfied by there being an available item in the pool since the last time the pool was reported in a heartbeat.
Number of misses	The overall number of times that an item taken from a pool was not satisfied by there being an available item in the pool resulting in an allocation.
Delta number of misses	The number of times that an item taken from a pool was not satisfied by there being an available item in the pool resulting in an allocation since the last time the pool was reported in a heartbeat.
Number of growths	The overall number of times the capacity of a pool had to be increased to accomodate returned items.
Delta number of growths	The number of times the capacity of a pool had to be increased to accomodate returned items since the last time the pool was reported in a heartbeat.
Number of evicts	The overall number of items that were evicted from the pool because the pool did not have an adequate capactiy to store them.
Delta number of evicts	The overall number of items that were evicted from the pool because the pool did not have an adequate capactiy to store them since the last time the pool was reported in a heartbeat.
Number of detached watches	The overall number of times that an item return to the pool was washed (e.g. fields reset) in the detached pool washer thread.
Delta number of detached watches	The number of times that an item return to the pool was washed (e.g. fields reset) in the detached pool washer thread since the last time the pool was reported in a heartbeat
Pool size	The number of items that are currently in the pool available for pool gets. This number will be 0 if all objects that have been allocated by the pool have been taken.
Number of preallocated objects	The number of items initially preallocated for the pool.
Pool capacity	The capacity of the backing array that is allocated to hold available pool items that have been preallocated or returned to the pool.
Pool key	The unique identifier for the pool.

App Stats

Reported for each app:

Engine stats
Store stats
Store binding persister stats
Store binding ICR sender stats
Bus binding stats
User stats

See Also:

AEP Engine Statistics for more detail about app-level stats.
User Defined App Stats for adding stats specific to your application to heartbeats.

...

The server statistics thread uses several named trace loggers to log the raw and computed statistics output. The following is the list of the loggers used by the server stats thread.

nv.server.stats.sys.trace – tracer for System Statistics
nv.server.stats.thread.trace – tracer for Thread Statistics
nv.server.stats.pool.trace – tracer for Pool Statistics
nv.server.stats.app.trace – tracer for Application and Engine Statistics
nv.server.stats.userstats.trace – tracer for User Defined Statistics

If you would like to see server statistics trace output, the appropriate tracers need to be configured at the debug level. For example, to enable System Stats trace output:

nv.server.stats.sys.trace=debug

See Also: X Platform Tracing and Logging for general details on configuration of trace logging.

...

System Stats

Appears in trace output when nv.server.stats.enable=true and nv.server.stats.sys.trace=debug

Panel

[System Stats]
Tue Jun 28 17:26:03 PDT 2016 'ems1' server (pid=50360) 2 apps (collection time=73982839 ns)
System: 20 processors load average: 2.30
Memory: HEAP 8.0G init 6.6G used 7.8G commit 7.8G max NON-HEAP 2M init 44M used 46M commit 0K max
Threads: 26 total (23 daemon) 27 peak
JIT: HotSpot 64-Bit Tiered Compilers, time: 11997 ms
[6 6404 ms] [2 8771 ms]

Thread Stats

Appears in trace output when nv.server.stats.enable=true and nv.server.stats.thread.trace=debug

[Thread Stats]
ID CPU CPU% USER% DCPU DCPU% DUSER% STATE NAME
1 0 1 100 0 0 0 RUNNABLE X-Server-ems1

Panel
Tip
Compare 2 consecutive intervals to determine if a GC occurred in the interval.

Thread Stats

Status

colour	Green
title	since 3.7

Individual thread stats can be traced by setting the following in DDL:

Code Block

	xml
	xml

<xvms>
  <xvm name="my-xvm">
    <heartbeats enabled="true" interval="5">
      <collectIndividualThreadStats>true</collectIndividualThreadStats>
      <tracing enabled="true">
        <traceThreadStats>true</traceThreadStats>
      </tracing>
    </heartbeats>
  </xvm>
</xvms>

Sample Trace Output

div

style	font-size: smaller

No Format
[Thread Stats] ID CPU DCPU DUSER CPU% USER% WAIT% STATE NAME 1 6.0s 982.8us 0 1 0 0 RUNNABLE X-Server-blackbird1-Main (aff=[])


2     9.3ms     0

1

100

     0       0     0     0     WAITING         Reference Handler


3     8.7ms     0

1

100

     0       0     0     0     WAITING

Finalizer
4 0 1 100 0 0 0 RUNNABLE Signal Dispatcher
23 0 3 100 0 0 0 RUNNABLE

        Finalizer
4     43.8us    0       0       0     0     0     RUNNABLE        Signal Dispatcher
23    53.9ms    722.7us 0       1     0     0     RUNNABLE        X-EDP-McastReceiver (aff=[1(s0c1t0)])

25

0

3

100 0 0 0

  26.3ms    426.5us 0       1     0     0     TIMED_WAITING   X-EDP-Timer (aff=[1(s0c1t0)])

0

4

100

0

1.9s

0

RUNNABLE X-ODS-StoreLog-ems1-heartbeats-1 (aff=[])
27 0 4 100 0 0 0 RUNNABLE X-Server-ems1

   33.9ms  30.0ms  1     1     0     RUNNABLE        X-Server-blackbird1-StatsRunner (aff=[1(s0c1t0)])

30

0

4

100

0

6.9m

0

TIMED_WAITING

Wrapper-Control-Event-Monitor
31

0

  10.2s   4.8s    100

0

TIMED_WAITING

Wrapper-Startup-Runner
33

4

100

0

RUNNABLE        X-Server-

ems1

blackbird1-IOThread-1 (aff=[8(s0c11t0)])

35


30    236.6us   0

5

100

     0       0     0     0     TIMED_WAITING   X-EventMultiplexer-Wakeup-admin (aff=[1(s0c1t0)])

36

0

5

100

0

685.4ms

0

RUNNABLE X-AEP-BusManager-BindingOpener-admin.admin (aff=[1(s0c1t0)])
37

 11.5ms  0

5

100

0

    1     0     0

RUNNABLE

    TIMED_WAITING   X-

AEP

EventMultiplexer-

BusManager-BindingOpener-admin.client-fb0ab760-3d8f-11e6-abfe-0002c9faaac2

Wakeup-blackbird (aff=[1(s0c1t0)])

38

0

5 100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-admin.control-fb0a1b20-3d8f-11e6-abfe-0002c9faaac2

  9.2m      10.3s   10.3s   100   100   100   RUNNABLE        X-ODS-StoreLog-blackbird-1 (aff=[

1

4(

s0c1t0

s0c4t0)])

0

5 100 0 0 0 TIMED_WAITING X-EventMultiplexer-Wakeup

  9.2m      10.3s   10.3s   100   100   0     RUNNABLE        SorProcessor (aff=[

1

5(

s0c1t0

s0c8t0)])


41    11.7ms    0

5

100

     0       0     0

RUNNABLE X-AEP-BusManager-BindingOpener-ems.market

    100   WAITING         X-STEMux-admin-1 (aff=[

1(s0c1t0)

])

0

5 100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-ems.ems

  9.0m      10.3s   10.2s   100   99    90    RUNNABLE        X-STEMux-blackbird-2 (aff=[

1

2(

s0c1t0

s0c2t0)])

0

6 100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-ems.client-fb48aac0-3d8f-11e6-abfe-0002c9faaac2

  7.0m      10.2s   4.8s    100   47    0     RUNNABLE        X-ODS-StoreReplicatorLinkReader-myapp-93323c0d-5e4c-48d7-8cd4-f251963a6310 (aff=[

1

3(

s0c1t0

s0c3t0)])

0

6

100 0 0 0 RUNNABLE X-AEP-BusManager-BindingOpener-ems.control-fb0a1b20-3d8f-11e6-abfe-0002c9faaac2

 52.0ms    973.7us 0       1     0     0     RUNNABLE        X-ODS-StoreLinkAcceptor-1 (aff=[1(s0c1t0)])

0

6

100

0

58.9ms

0

WAITING

X-STEMux-admin-1 (aff=[])
46 0 6 100 0 0 0 RUNNABLE X-STEMux-ems-2

1.0ms   0       1     0     0     RUNNABLE        X-EDP-McastReceiver (aff=[

2

1(

s0c2t0

s0c1t0)])

47

0

6

100 0 0 0 RUNNABLE X-ODS-StoreLinkAcceptor-1

  41.9ms    592.2us 0       1     0     0     TIMED_WAITING   X-EDP-Timer (aff=[1(s0c1t0)])

0

6 100 0 0 0 RUNNABLE X-ODS-StoreEventMultiplexer-1

  9.1m      10.3s   10.1s   100   98    98    RUNNABLE        X-AEP-BusManager-IO-blackbird.market (aff=[

3

7(

s0c3t0

s0c10t0)])


49    1.1s      0

6

100

     0       0     0     0     RUNNABLE        X-

AEP

Client-

BusManager

LinkManagerReader[c43b3977-

IO-ems.market

572f-4366-8524-f17678e71515] (aff=[

6

9(

s0c9t0

s0c12t0)])

0

6 100 0 0 0 RUNNABLE X-Client-LinkManagerReader[298b29db-82ae-4cc7-97b7-1143a417e86c]

  9.1m      10.3s   10.3s   100   100   93    RUNNABLE        X-AEP-BusManager-IO-blackbird.blackbird (aff=[

7

6(

s0c10t0

s0c9t0)])

Pool Stats

Appears in trace output when nv.server.stats.enable=true and nv.server.stats.pool.trace=debug

Panel

<27,50360,perf5.neeveresearch.com> 20160628-17:26:03:609 (dbg)...
[Pool Stats]
PUT DPUT GET DGET HIT DHIT MISS DMISS GROW DGROW EVIC DEVIC DWSH DDWSH SIZE PRE CAP NAME
0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1024 packet.server_client_connect_request.71
0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1024 packet.server_client_connect_reply.50
46 0 46 0 44 0 2 0 0 0 0 0 0 0 2 0 1024 iobuf.heap-256.4
4 0 145 0 4 0 141 0 0 0 0 0 0 0 0 0 1024 iobuf.native-256.23
49 0 64 0 48 0 16 0 0 0 0 0 0 0 1 0 1024 iobuf.heap-64.2
18 0 27.3M 0 18 0 27.3M 0 0 0 0 0 0 0 0 0 1024 iobuf.native-32.20
6 0 29 0 6 0 23 0 0 0 0 0 0 0 0 0 1024 iobuf.native-64.21
0 0 5 0 0 0 5 0 0 0 0 0 0 0 0 0 1024 packet.recovery_log_eof_marker.87
0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 1024 packet.recovery_log_checkpoint_state.39
48 0 48 0 46 0 2 0 0 0 0 0 0 0 2 0 1024 packet.discovery_esa.65
2 0 33 0 2 0 31 0 0 0 0 0 0 0 0 0 1024 iobuf.heap-32.1
0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 1024 iobuf.native-8K.28
0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 1024 xbuf.entity.SrvMonAppInfoXbufEntity.0.100.106

App Stats and User Stats

...

Panel

[App (ems) Engine Stats]
MRCV RATE DRATE MRCVB RATE DRATE MRCVG RATE DRATE MSND RATE DRATE MSNDB RATE DRATE MSNDG RATE DRATE EVNT RATE DRATE FEVNT RATE DRATE TXN RATE DRATE TXNSZ RLBK
106K 613 92 106K 613 92 0 0 0 106K 613 92 106K 613 92 0 0 0 320K 1.8K 278 106K 613 92 106K 613 92 0 0
...Disruptor {[0of1,024] 0% (MultiThreadedSufficientCores, BusySpin)}
...Feeder Queues{
......X-Server-ems1-Main (aff=[]): size=0, decongestCount=284178
......X-ODS-StoreEventMultiplexer-1 (aff=[3(s0c3t0)]): size=0, decongestCount=284164
......X-Server-ems1-IOThread-1 (aff=[8(s0c11t0)]): size=0, decongestCount=284162
......X-AEP-BusManager-IO-ems.market (aff=[6(s0c9t0)]): size=0, decongestCount=284160
......X-STEMux-ems-2 (aff=[2(s0c2t0)]): size=0, decongestCount=284160
......X-Client-LinkManagerReader[298b29db-82ae-4cc7-97b7-1143a417e86c] (aff=[7(s0c10t0)]): size=0, decongestCount=284159
...}

Where columns can be interpreted as:

Column	Description
ID	The thread's id
CPU	The total amount of time in nanoseconds that the thread has executed (as reported by the JMX thread bean)
DCPU	The amount of time that the thread has executed in user mode or system mode (as reported by the JMX thread bean)
DUSER	The amount of time that the thread has executed in user mode in the given interval in nanoseconds (as reported by the JMX thread bean)
CPU%	The percentage of cpu time the thread used during the interval (e.g. DCPU * 100 / interval time)
USER%	The percentage of user mode cpu time the thread used during the interval (e.g. DCPU * 100 / interval time)
WAIT%	The percentage of the time that the thread was recorded in a wait state such as a busy spin loop or a disruptor wait. Wait times are proactively captured by the platform via code instrumentation that takes a timestamp before and after entering/exiting the wait condition. This means that unlike CPU% or USER%, this percentage can include time when the thread is not using scheduled and consuming cpu resources. Because of this It is not generally possible to simply subtract WAIT% from CPU% to calculate the amount of time the thread actually executed. For example if CPU% is 50 and WAIT% is also 50 and the interval is 5 seconds, it could be that 2.5 seconds of real work was done while 2.5 seconds of wait time occurred while the thread was context switched out, or it could be that all 2.5 seconds of wait time coincided with the 2.5 seconds of of cpu time and all of the cpu time was spent busy spinning. In other words, WAIT% gives a definitive indication of time that the thread was not doing active work during the interval, the remaining cpu time is at the mercy of the operating systems thread scheduler.
STATE	The thread's runnable state at the time of collection
NAME	The thread name. Note that when affinitization is enabled and the thread has been affinitized, that affinitization information is append to the thread name. This is useful when trying to determine whether a thread should be affinitized. A Busy spinning thread will typically have a CPU% of ~100. If the thread is not affinitized, it might be a good candidate.
affinity	The affinity summary string reported along with individual thread stats is not reported in a column of its own as the affinitizer appends it the thread name.

CPU times are reported according to the most appropriate short form of:

Unit	Abbreviation
Days	d
Hours	h
Minutes	m
Seconds	s
Milliseconds	ms
Microseconds	us
Nanoseconds	ns

Pool Stats

Pools stats can be traced by setting the following in DDL:

Code Block

	xml
	xml

<xvms>
  <xvm name="my-xvm">
    <heartbeats enabled="true" interval="5">
      <collectPoolStats>true</collectPoolStats>
      <tracing enabled="true">
        <tracePoolStats>true</tracePoolStats>
      </tracing>
    </heartbeats>
  </xvm>
</xvms>

To reduce the size of heartbeats, Pool Stats for a given pool are only included when:

A miss has been recorded for the pool in a given interval and it results in a new object being allocated.
The number of preallocated obects taken from a pool drops below the configured value for the pool depletion threshold.

Sample Trace Output

div

style	font-size: smaller

No Format

[Pool Stats]
PUT   DPUT  GET   DGET  HIT   DHIT  MISS  DMISS GROW  DGROW EVIC  DEVIC DWSH  DDWSH SIZE  PRE   CAP   NAME
38    0     16.8M 0     38    0     16.8M 0     0     0     0     0     0     0     0     0     1024  iobuf.native-32.20
1     0     62    0     1     0     61    0     0     0     0     0     0     0     0     0     1024  iobuf.native-64.21
1     0     1.0M  0     1     0     1.0M  0     0     0     0     0     0     0     0     0     1024  iobuf.native-256.23
7     0     75    0     7     0     68    0     0     0     0     0     0     0     0     0     1024  iobuf.heap-32.1

Stat	Description
PUT	The overall number of times items were put (returned) to a pool.
DPUT	The number of times items were put (returned) to a pool since the last time the pool was reported in a heartbeat (the delta).
GET	The overall number of times an item was taken from a pool. If pool items are not being leaked, GET - PUT indicates the number of items that have been taken from the pool and not returned ... e.g. items that are being held by messages in the transaction processing pipeline or application state.
DGET	The number of times an item was taken from a pool since the last time the pool was reported in a heartbeat (the delta).
HIT	The overall number of times that an item taken from a pool was satisfied by there being an available item in the pool.
DHIT	The number of times that an item taken from a pool was satisfied by there being an available item in the pool since the last time the pool was reported in a heartbeat(the delta).
MISS	The overall number of times that an item taken from a pool was not satisfied by there being an available item in the pool resulting in an allocation.
DMISS	The number of times that an item taken from a pool was not satisfied by there being an available item in the pool resulting in an allocation since the last time the pool was reported in a heartbeat.
GROW	The overall number of times the capacity of a pool had to be increased to accomodate returned items.
DGROW	The number of times the capacity of a pool had to be increased to accomodate returned items since the last time the pool was reported in a heartbeat.
EVIC	The overall number of items that were evicted from the pool because the pool did not have an adequate capactiy to store them.
DEVIC	The overall number of items that were evicted from the pool because the pool did not have an adequate capactiy to store them since the last time the pool was reported in a heartbeat.
DWSH	The overall number of times that an item return to the pool was washed (e.g. fields reset) in the detached pool washer thread.
DDWSH	The number of times that an item return to the pool was washed (e.g. fields reset) in the detached pool washer thread since the last time the pool was reported in a heartbeat
SIZE	The number of items that are currently in the pool available for pool gets. This number will be 0 if all objects that have been allocated by the pool have been taken. Note that because pool stats are generally printed when there are pool misses, this value will often be 0 reflecting that there are no items available in the pool
PRE	The number of items initially preallocated for the pool.
CAP	The capacity of the backing array that is allocated to hold available pool items that have been preallocated or returned to the pool. The capacity of a pool will grow automatically as items are returned to the pool without being taken out. A large capacity generally indicates that at some point in the past a larger number of items was needed, but are not currently being used.
NAME	The unique identifier for the pool.

Engine Stats

Stats collected by the AEP engine underlying your application are also included in heartbeats. Tracing of engine stats can be enabled with the following.

Code Block

	xml
	xml

<xvms>
  <xvm name="my-xvm">
    <heartbeats enabled="true" interval="5">
      <tracing enabled="true">
        <traceAppStats>true</traceAppStats>
      </tracing>
    </heartbeats>
  </xvm>
</xvms>

See AEP Engine Statistics for more detail about engine stats.

User Stats

User stats collected by your application are also included in heartbeats. Tracing of user stats can be enabled with the following.

Code Block

	xml
	xml

<xvms>
  <xvm name="my-xvm">
    <heartbeats enabled="true" interval="5">
      <tracing enabled="true">
        <traceUserStats>true</traceUserStats>
      </tracing>
    </heartbeats>
  </xvm>
</xvms>

Sample Trace Output

div

style	font-size: 10px;

No Format
[App (ems) User Stats]


...Gauges{


......EMS Messages Received: 142604


......EMS Orders Received: 35651


...}


...Series{


......[In Proc Tick To Trade(sno=35651, #points=150, #skipped=0)


.........In Proc Tick To Trade(interval): [sample=150, min=72 max=84 mean=75 median=75 75%ile=77 90%ile=79 99%ile=83 99.9%ile=84 99.99%ile=84]


.........In Proc Tick To Trade (running): [sample=35651, min=72 max=

20008

2000 mean=93 median=76 75%ile=82 90%ile=111 99%ile=227 99.9%ile=805 99.99%ile=

11975

1197]


......[In Proc Time To First Slice(sno=35651, #points=150, #skipped=0)


.........In Proc Time To First Slice(interval): [sample=150, min=85 max=98 mean=88 median=88 75%ile=90 90%ile=92 99%ile=95 99.9%ile=98 99.99%ile=98]


.........In Proc Time To First Slice (running): [sample=35651, min=84 max=

44695

4469 mean=249 median=88 75%ile=95 90%ile=133 99%ile=283 99.9%ile=

36287

3628 99.99%ile=

41439

4143]


...}

[App

(ems)

Store Binding Stats]
CSND RATE DRATE CESND RATE DRATE CRCV RATE DRATE CERCV RATE DRATE CCSND RATE DRATE CCRCV RATE DRATE SIZE
106K 613 88 106K 613 88 0 0 0 0 0 0 0 0 0 106K 613 88 0
[App (ems) Store Binding Persister Stats]
<?>
...no persister
[App (ems) Store Binding ICR Sender Stats]
<?>
...no ICR sender
[App (ems) Bus Binding (market)]
<Detached ,Disruptor {[0of1,024] 0% (SingleThreaded, BusySpin)}>
MRCV RATE DRATE MRCVI RATE DRATE MRCVB RATE DRATE MENQ RATE DRATE MSND RATE DRATE FLS RATE DRATE MFLS RATE DRATE FLA RATE DRATE MFLA RATE DRATE
71302 408 57 0 0 0 0 0 0 35651 204 28 35651 204 28 71302 408 57 35651 204 28 0 0 0 0 0 0
STB RATE DRATE STBIB RATE DRATE STBB RATE DRATE PRCV RATE DRATE COM RATE DRATE RLBK CLNTS CHNLS FAILS
0 0 0 0 0 0 0 0 0 0 0 0 35651 204 28 0 0 2 0
[App (ems) Bus Binding (control-fb0a1b20-3d8f-11e6-abfe-0002c9faaac2)]
<Attached>
MRCV RATE DRATE MRCVI RATE DRATE MRCVB RATE DRATE MENQ RATE DRATE MSND RATE DRATE FLS RATE DRATE MFLS RATE DRATE FLA RATE DRATE MFLA RATE DRATE
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
STB RATE DRATE STBIB RATE DRATE STBB RATE DRATE PRCV RATE DRATE COM RATE DRATE RLBK CLNTS CHNLS FAILS
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
[App (ems) Bus Binding (ems)]
<Attached>
MRCV RATE DRATE MRCVI RATE DRATE MRCVB RATE DRATE MENQ RATE DRATE MSND RATE DRATE FLS RATE DRATE MFLS RATE DRATE FLA RATE DRATE MFLA RATE DRATE
35651 204 28 0 0 0 0 0 0 71302 408 56 71302 408 56 71302 408 56 71302 408 56 0 0 0 0 0 0
STB RATE DRATE STBIB RATE DRATE STBB RATE DRATE PRCV RATE DRATE COM RATE DRATE RLBK CLNTS CHNLS FAILS
0 0 0 0 0 0 0 0 0 0 0 0 71302 408 56 0 0 2 0

Tip

The server stats rendering of application stats is more compact, but because stats trace is most often enabled for testing and performance tuning, it is more common to enable engine stats trace rather than server stats trace for application stats

See Also:

User Defined App Stats for adding stats specific to your application to heartbeats.

The Talon Manual

Additional Links

Page History

Versions Compared

Old Version 3

New Version Current

Key

Anchor_Toc221568551_Toc221568551Overview

Configuration Settings

Enabling Heartbeats

Handling Server Heartbeats

Enabling Global Stats

Environment Properties

Engine Stats

Handling XVM Heartbeats

Heartbeat Tracing

Heartbeat Logging

Heartbeat Event Handlers

Admin Clients

Anchor_

Toc221568564_

Toc221568564Heartbeat Trace Output

System Stats

Sample Trace Output

General Info

System Info

Memory Info

Disk

Thread Info

JIT Info

GC Info

Thread Stats

Pool Stats

App Stats

System Stats

Thread Stats

Thread Stats

Sample Trace Output

Pool Stats

App Stats and User Stats

Pool Stats

Sample Trace Output

Engine Stats

User Stats

Sample Trace Output

Anchor
_Toc221568551
_Toc221568551
Overview

Anchor
_

Toc221568564
_

Toc221568564
Heartbeat Trace Output