...
The platform provides the ability for users to define their own application specific stats. These user defined app stats can be registered with the AepEngine, which allows them to be traced along with Aep Engine Stats, and and be included in server heartbeats. Applications can programatically programmatically register stats with the AepEngine, or when running in a Talon server to be discovered via annotations.
...
Note | ||
---|---|---|
| ||
Gauge values are collected on a stats collection thread(s) separate from the business logic thread. ConsequenltyConsequently:
Additionally, for method gauges:
|
...
Code Block | ||
---|---|---|
| ||
import com.neeve.stats.*; public void MyApp() { @AppStat(name="Last Order ID Processed") private volatile int lastOrderNumber = -1; @EventHandler public void onNewOrder(NewOrderMessage message) { lastOrderNumber = message.getOrderId(); } } |
Method Gauges
When running in a Talon Server, it is possible to annotate a method as a gauge accessor:
Code Block | ||
---|---|---|
| ||
import com.neeve.stats.*; public void MyApp() { private volatile int numInvalidOrders; @EventHandler public void onNewOrder(NewOrderMessage message) { if(message.getQuantity() < 0) { numInvalidOrders.increment(); } } @AppStat(name = "Invalid Order Flag") public boolean getHasOrderErrors() { return numInvalidOrders > 0; } } |
Note |
---|
Method accessor gauges for primitive type are not Zero Garbage. This is because the platform invokes getHasOrderErrors via reflection which generates autoboxing garbage. A better approach, if your application is senstive sensitive to garbage, is to use a Gauge subclass which can directly return the primitive type. |
Gauge Subclass Field
You can subclass one of the XXXGauge implementations to avoid garbage associated with an annotated a method. This is useful if your Gauge needs to be calculated or you are not running in a Talon Server and you need to programmatically register a Gauge instance with the AepEngine.
Code Block | ||
---|---|---|
| ||
import com.neeve.stats.*; public void MyApp() { private volatile int numInvalidOrders; @AppStat private final Gauge orderErrorsGauge = new BooleanGauge("Invalid Order Flag") { public boolean getBooleanVAlue() { return numInvalidOrders > 0; } }; @EventHandler public void onNewOrder(NewOrderMessage message) { if(message.getQuantity() < 0) { numInvalidOrders.increment(); } } } |
Note that in the above case the 'name' attribute is omitted on the AppStat annotation because it is provided directly when creating the Gauge.
Gauges on Server Heartbeats
Gauges can be read programmatically on server heartbeats:
...
Note that in the above examples that gauges , gauges fields are declared as volatile. This is because gauge values are collected by the statistics thread that is emitting server heartbeats, not the application's business logic thread.
...
A Counter
is useful for recording a monotonically increasing value over time. Sampled periodically, it can be used to derive a rate. For example, a counter could be used to record a number of message received. By sampling it over time, it can be used to create a received message rate.
...
If Aep engine stats tracing is enabled, the above stat will be printed along with the rest of engine stats in the format format.
<overallCount> <lastIntervalCount> (<overallRate> <lastIntervalRate>):
No Format |
---|
[User Counter Stats] ...Invalid Orders: 9 1 (1.01 1) |
So from From the above, we can see that there were 9 invalid orders in the lifetime of the app, 1 invalid order in the last interval, and that the app is receiving a little over 1 invalid order / sec.
...
A common usecase for a Series statistic is collecting Latency timing data. In that, one would like to be able to observe median, min, max 99.99% for message processing times to ensure that SLAs are being met. However NonHowever, Non-lossy collection and reporting of histographical latency statistics is a challenging problem in low latency systems due to the number of data points that need to be retained, computed and serialized. For example, imagine an application that is recording latency statistics for messages coming in at a rate of 10k/sec. To accurately compute and report percentiles with a collection period of 10 seconds, this would mean that the application needs to retain at least 100,000 data points per statistic to perform histographical analysis for just one interval! Assuming that the values are double or long values, then one would be looking at ~800Kb per statistic collected. Collecting and computing on such data is hard on processor memory caches and can have a disruptive impact on application processing times. Furthermore, to perform longer term histographical analysis (across multiple collection periods) without losing any data, each set of interval results needs to be stored so that computation can be performed. Persisting such data to disk or emitting it in server heartbeats to achieve this is also problematic because it leads to a large volume of data which puts a strain on disk space and bandwidth, or in the case of heartbeats network heartbeats, network bandwidth when emitted over the messaging fabric.
...
An HDRHistogram compromises on precision of the captured latencies in favor of cheaper computation and storage of results while still maintaining a predictable precision. The documentation on HDR histogram provides details on the level of precision that is achieved. Practically speaking, however, for latency data points in the 100s of microseconds the precision that is guaranteed for collected percentiles is in the order of +/- 1us, which is acceptable for most applications (for tail values, say in the range of 1 minute, the value is guaranteed to be correct within +/- 60ms).
Creating a Series Stat
Code Block | ||
---|---|---|
| ||
import com.neeve.stats.IStats.Series; import com.neeve.stats.StatsFactory; @AppStat private final Series newCustomerAge = StatsFactory.createSeriesStat("New Customer Age"); public void MyApp() { @EventHandler public void onNewCustomer(NewCustomerCreation message) { newCustomerAge .add(message.getQuantity()); } } |
...
In the above we can see that in the in last interval, one new customer registered and their age was 21. Over the last 8 intervals, the average new customer age is 23 with the oldest being 29 and the youngest being 21.
...
Series data for user stats are exposed in the Server Monitoring Heartbeat using the SrvMonUserSeriersStat object:
...
Field Name | type | Description | ||
---|---|---|---|---|
name | String | When the server is configured to include the capture data points for the statistic, the returned array will include the values collected during this interval. This allows monitoring tools to perform non-lossy calculation of percentiles, providing new data points were skipped due to under sampling or a missed heartbeat. | ||
seriesType | SrvMonSeriesType | The type of the series data.
| ||
intSeries | SrvMonIntSeries | The collected int series data for an INT series. This field should only be set when the series type is set to SrvMonSeriesType.INT. |
...
Field Name | type | Description |
---|---|---|
dataPoints | int[] | When the server is configured to include the capture data points for the statistic, the returned array will include the values collected during this interval. This allows monitoring tools to perform non-lossy calculation of percentiles, providing new data points were skipped due to under sampling or a missed heartbeat. |
lastSequenceNumber | long | Sequence numbers for collected data points start at 1, a value of 0 indicates that no data points have been collected. The Sequence Number always indicates the number or data points that have been collected since the statistic has been created or was last reset. |
numDataPoints | int | Indicates the number of data points collected in this interval. If no data points were collected, numDataPoints will be 0. |
skippedDataPoints | long | The runtime only holds on to a fixed number of data points for any particular Latency statistic. If the sampling interval is too high, then some datapoints may be skipped. For example, let's say Latency stats are configured to hold on to a sample size of 1000 datapoints. If the number of data points being capture captured per second is 2000, and the stats collection interval is 1 second, then on each collection, 1000 datapoints will be missed, which will skew results. |
SrvMonIntHistogram | intervalStats | Holds computed results for the datapoints captured for this heartbeat (e.g. for the numDataPoints captured). This field may not be set if numDataPoints is 0 or if interval computations are not done on the server.NEW IN 3.1 |
SrvMonIntHistogram | runningStats | Holds computed results for the datapoints over the lifetime of this statistic (e.g. since seqNo 1). If the underlying statistic is reset then the running stats are also corresponding reset. |
...
Field Name | type | Description |
---|---|---|
sampleSize | long | The number of datapoints over which results were calculated (possibly 0 if no data points were collected). |
minimum | int | The minimum value recorded in the sample set. The value is not set if the sample size is 0. |
maximum | int | The maximum value recorded in the sample set. The value is not set if the sample size is 0. |
mean | int | The mean for the values recorded in the sample set. The value is not set if the sample size is 0. |
median | int | The median for the values recorded in the sample set. The value is not set if the sample size is 0. |
pct75 | int | The 75th percentile for the values recorded in the sample set. The value is not set if the sample size is 0. |
pct90 | int | The 90th percentile for the values recorded in the sample set. The value is not set if the sample size is 0. |
pct99 | int | The 99th percentile for the values recorded in the sample set. The value is not set if the sample size is 0. |
pct999 | int | The 99.9th percentile for the values recorded in the sample set. The value is not set if the sample size is 0. |
pct9999 | int | The 99.99th percentile for the values recorded in the sample set. The value is not set if the sample size is 0. |
samplesOverMax | long | The number of samples that exceeded the maximum recordable value for the histogram. When computing latency percentiles using an HDRHistogram, it is possible that a recorded value will exceed the maximum value allowable. In this case, the datapoint in is downsampled to the maximum recordable value, which skews the percentile calculations lower. SamplesOverMax allows detection of how frequently this is occuringoccurring. |
samplesUnderMin | long | The number of samples capture captured that were below the recordable value for the histogram. When computing latency percentiles using an HDRHistogram, it is possible that a recorded value will be below 0 in cases where clock skew is possible. In such cases, the the value will be upsampled to 0, which can skew the histogram results. SamplesUnderMin allows detection of how frequently this is happening. |
...
The AppStat annotation can be used to annotate user defined statistics in the application to allow them those statistics to be discovered by a Talon Server. The Talon server will register each statistic if it finds with the application's AepEngine. AppStat annotations are only introspected once: just after the application's AepEngine is injected. If the application changes the instance after application initialization, the new stat instance won't be discovered by the application.
...
Any @AppStat annotated field in the main application class will be discovered by the Talon server, : if additional classes in your application contain user defined stats, they can be exposed to the server using the AppStatContainerAccessor annotation:.
Code Block | ||||
---|---|---|---|---|
| ||||
@AppHAPolicy(HAPolicy.EventSourcing) public static class MyApp { MyOtherClass someOtherClass = new MyOtherClass(); @AppStatContainersAccessor public void getStatContainers(Set<Object> containers) { containers.add(someOtherClass ); } } private static class MyOtherClass { @AppStat Counter numHeartbeats = StatsFactory.createCounterStat("Heartbeats Received"); StatContainer() { } } |
...