The Talon Manual

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

An operational AEP engine and its underlying components such as its HA Store and Bus Bindings capture many metrics and statistics during the course of its operation. These metrics are periodically collected by the XVM in which the engine is running and reported in XVM heartbeats which can then be traced, logged in a binary format or emitted to monitoring applications. This document describes the statistics that are collected, how to enable those statistics that aren't enabled by default and describes the format in which statistics are traced. 

Configuration Settings

Global Latency Stats Settings

Most engine metrics are collected by default and low overhead and are their collection cannot be disabled. Other types of statistics such as Latency Statistics, Per Message Type statistics, and per Transaction Statistics can impact application performance for performance sensitive apps, and their collection is disabled by default. The subsections below describes the additional statistics that can be collected and how they can be enabled. 

Global Latency Stats Settings

Latency statistics, however, are more expensive to collect and can impact application performance. Consequently latency stats collection is disable by default.The platform provides several knobs that allow applications to control what latency statistics are collected and balance the performance cost against operational visibility. The table below summarizes the various configuration settings that control latency collection at a process-wide level. 

Configuration SettingDefaultDescription
nv.stats.series.samplesize10240

Property that can be used to control the default sampling size for series stats.

If the number of datapoints collected in a stats interval exceeds this size, the computation for histographical data will be lossy. Increasing the value reduces loss of datapoints, but results in greater overhead in stats collection in terms of both memory usage and pressure on the process caches.

nv.msg.latency.statsfalse

Property that globally enables collection of message latency stats as messages flow through the system. These statistics include latencies in the flow outside of transaction processing. For received messages these statistics include transmission, deserialization and dispatch costs. For sent messages these include serialization and transmission costs.

When set to true, timings for messages are captured as they flow through the system. Enablement of these stats is required to collect message bus latency stats. Enabling this property can increase latency due to the overhead of tracking timestamps.

nv.msgtype.latency.statsfalse

Property that enables tracking of message latency stats on a type by type basis.

When set to true, timings for each message type are individually tracked as separate stats

(lightbulb) Due to their overhead, these statistics are not included in heartbeats emitted by an XVM.

nv.ods.latency.statsfalse

Indicates whether or not store latency statistics are captured. Store latency stats expose statistics related to serializing replicating and persisting transactions.

(lightbulb) These stats must be enabled in order to include latency stats along with an application's store statistics.

nv.event.latency.statsfalse

Indicates whether or not event latency statistics are captured. Enabling Event latency stats record timestamps for enqueue and dequeue of events across event multiplexermultiplexers, such as the AepEngine's input multiplexer queue. Enabling event latency stats is useful for determining if an engine's event multiplexer queue is backing up by recording the time that events remain on the input queue.

(lightbulb) These stats must be enabled in order to capture input queuing times.

nv.link.network.stampiotsfalse

Indicates whether or not timestamps should be stamped on inbound and outbound messages. Disabled by default.

Enabling this setting will allow engines to provide more detail in the form of transaction legs in message latency statistics.

(lightbulb) These stats must be enabled to report 'wire' times for an application's store.

...

  • Message receipt and send statistics described above per message type
  • Histograms for application processing and filtering times. 

Anchor
_Toc221568563
_Toc221568563

...

Appendix A

...

An AEP engine's statistics thread uses a named trace logger to log the raw and computed statistics output. The following is the name of the logger used by an engine:
nv.aep.<engineName>.stats
For example, the 'forwarder' engine uses the following logger to log statistics output:
nv.aep.forwarder.stats
The configured logger can be either a native logger or an SLF4J logger. Refer to the X Platform Tracing and Logging document for details on X Platform trace loggers.

...

- Statistics Output Threads

The following output threads can be enabled to trace individual types of statistics, which is useful for testing and performance tuning. Enabling these output threads is not required for collecting stats. Statistics trace output is not zero garbage, so in a production scenario it usually makes more sense to collect stats via Xvm Heartbeats, which emits zero garbage heartbeats with the above statistics. 

Note

When an AepEngine is running inside of an XVM (the most common case), engine statistics are included in XVM heartbeats and should be traced using the XVM tracing facilities. The trace threads described below should not be enabled when running within an XVM as collection by the trace threads and XVM Stats collector thread can interfere with one another.

See Tracing Heartbeats

Configuration Setting
Default
Description
nv.aep.<engine>.stats.interval0

The interval (in seconds) at which engine stats will be traced for a given engine.

Can be set to a positive integer indicate the period in seconds at which the engine's stats dump thread will dump recorded engine statistics. Setting a value of 0 disables creation of the stats thread.

When enabled, engine stats are traced to the logger 'nv.aep.engine.stats' at a level of Tracer.Level.INFO; therefore, to see dumped stats, a trace level of 'nv.aep.engine.stats.trace=info' must be enabled.

NOTE: disabling the engine stats thread only stops stats from being periodically traced. It does not stop the engine from collecting stats; stats can still be collected by an external thread (such as the Talon Server which reports the stats in server heartbeats). In other words, enabling the stats thread is not a prerequisite for collecting stats, and disabling the stats reporting thread does not stop them from being collected.

NOTE: while collection of engine stats is a zero garbage operation, tracing engine stats is not a zero garbage when performed by this stats thread. For latency sensitive apps, it is recommended to run in a Talon server which can collect engine stats and report them in heartbeats in a zero garbage fashion.

nv.aep.<engine>.sysstats.interval0

The interval (in seconds) at which engine sys stats will be reported. Set to (the default) to completely disable sys stats tracing for a given engine.

In most cases, AEP sys stats will not be used and system level stats would be recorded in the Server Statistics from which an AEPEngine is running.

nv.event.mux.<name>.stats.interval0

The interval (in seconds) at which multiplexer stats will be traced.

Multiplexer stats can also be reported as part of the overall engine stats from the engine stats thread, so there is no need to set this to a non-zero value if nv.aep.<engine>.stats.interval is greater than zero.

nv.msg.latency.stats.interval0

The interval (in seconds) at which message latency stats are traced.

This setting has no effect if nv.msg.latency.stats is false. This allows granular tracing of just message latency stats on a per bus basis. Message latency stats can also be reported as part of the overall engine stats from the engine stats thread, so there is no need to set this to a non-zero value if nv.aep.<engine>.stats.interval is greater than zero.

nv.aep.busmanager.<engine>.<bus>.stats.interval0The interval (in seconds) at which bus stats will be traced. Bus stats reported as part of the overall engine stats from the engine stats thread, so there is no need to set this to a non-zero value if nv.aep.<engine>.stats.interval is greater than zero. When engine stats output is disabled this can be used to trace only bus stats for a particular message bus.

Anchor
_Toc221568564
_Toc221568564
Output Trace Logger

An AEP engine's statistics thread uses a named trace logger to log the raw and computed statistics output. The following is the name of the logger used by an engine:
nv.aep.<engineName>.stats
For example, the 'forwarder' engine uses the following logger to log statistics output:
nv.aep.forwarder.stats
The configured logger can be either a native logger or an SLF4J logger. Refer to the X Platform Tracing and Logging document for details on X Platform trace loggers.

Anchor
_Toc221568565
_Toc221568565
Appendix B – Statistics Output Format

The following is a sample output of the statistics output by an AEP engine's statistics thread sections break down the output of engine statistics. The format below is used when the stats are traced by an XVM, the Stats Dump Tool or the Statistics threads described in Appendix A.

Panel
<11,33440,wsip-24-120-50-130.lv.lv.cox.net> 20130204-03:09:43:338 (inf)...[ <nv.aep.aep.forwarder.stats> STATS]
Flows{1} Msg{In{25,901(364 0) 25,901(364 0) 0(0 0) 0(0 0) 0X(0 0) (0)} Out{25,901(364 0) 25,901(364 0) 0(0 0) (25,901 25,901 0 0) (0)} Latency{InOut{0 us} InAck{0 us}}} Ev{51,806(728 0) 25,901[25,901,0,25,901](364 0)} Txn{25,902[(25,902,25,902),(25,902,25,902 (0)),(0,0 (0)),0](364 0) 0} Store{-1}
[Message Type Specific]
OrderEvent In{0(0 0) 0(0 0) 0(0 0) 0(0 0) (0)} Out{25,901(364 0) 25,901(364 0) 0(0 0) (0)}
Trade In{25,901(364 0) 25,901(364 0) 0(0 0) 0(0 0) (0)} Out{0(0 0) 0(0 0) 0(0 0) (0)} 

...