The Talon Manual

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The platform provides the ability for users to define their own application specific stats. These user defined app stats can be registered with the AepEngine, which allows them to be traced along with Aep Engine Stats, and  and be included in server heartbeats. Applications can programatically programmatically register stats with the AepEngine, or when running in a Talon server to be discovered via annotations. 

...

 

Note
titleImportant considerations regarding gauge collection

Gauge values are collected on a stats collection thread(s) separate from the business logic thread. ConsequenltyConsequently:

  • Gauge field values must be declared as volatile to ensure changes to them are visible to collection threads.

Additionally, for method gauges:

  • Be sure that the computation cost is not so high that it skews statistics collection. Consider using a background thread for computing gauge values that are computationally expensive.
  • It is possible that more than one stats collection thread will be collecting and reporting on stats concurrently, so method gauges should be threadsafe.

 

...

 
Code Block
languagejava
import com.neeve.stats.*;
 
public void MyApp() {
  
  @AppStat(name="Last Order ID Processed")
  private volatile int lastOrderNumber = -1;
 
  @EventHandler
  public void onNewOrder(NewOrderMessage message) {
     lastOrderNumber = message.getOrderId();
  }  
}

Method Gauges

When running in a Talon Server, it is possible to annotate a method as a gauge accessor:

Code Block
languagejava
import com.neeve.stats.*;
 
public void MyApp() {
  private volatile int numInvalidOrders;
 
  @EventHandler
  public void onNewOrder(NewOrderMessage message) {
    if(message.getQuantity() < 0) {
      numInvalidOrders.increment();
    }
  }  
 
  @AppStat(name = "Invalid Order Flag")
  public boolean getHasOrderErrors() {
	return numInvalidOrders > 0;
  }
}
Note

Method accessor gauges for primitive type are not Zero Garbage. This is because the platform invokes getHasOrderErrors via reflection which generates autoboxing garbage. A better approach, if your application is senstive sensitive to garbage, is to use a Gauge subclass which can directly return the primitive type.

Gauge Subclass Field

You can subclass one of the XXXGauge implementations to avoid garbage associated with an annotated a method. This is useful if your Gauge needs to be calculated or you are not running in a Talon Server and you need to programmatically register a Gauge instance with the AepEngine. 

Code Block
languagejava
import com.neeve.stats.*;
 
public void MyApp() {
  private volatile int numInvalidOrders;
  
  @AppStat
  private final Gauge orderErrorsGauge = new BooleanGauge("Invalid Order Flag") {
    public boolean getBooleanVAlue() {
      return numInvalidOrders > 0;
    }
  };
    
  @EventHandler
  public void onNewOrder(NewOrderMessage message) {
    if(message.getQuantity() < 0) {
      numInvalidOrders.increment();
    }
  }  
}

Note that in the above case the 'name' attribute is omitted on the AppStat annotation because it is provided directly when creating the Gauge. 

Gauges on Server Heartbeats

Gauges can be read programmatically on server heartbeats:

...

Note that in the above examples that  gauges , gauges fields are declared as volatile. This is because gauge values are collected by the statistics thread that is emitting server heartbeats, not the application's business logic thread.

...

A Counter is useful for recording a monotonically increasing value over time. Sampled periodically, it can be used to derive a rate. For example, a counter could be used to record a number of message received. By sampling it over time, it can be used to create a received message rate.

...

If Aep engine stats tracing is enabled, the above stat will be printed along with the rest of engine stats in the format format.

<overallCount> <lastIntervalCount> (<overallRate> <lastIntervalRate>):

No Format
[User Counter Stats]
...Invalid Orders: 9 1 (1.01 1) 

So from From the above, we can see that there were 9 invalid orders in the lifetime of the app, 1 invalid order in the last interval, and that the app is receiving a little over 1 invalid order / sec. 

...

A common usecase for a Series statistic is collecting Latency timing data. In that, one would like to be able to observe median, min, max 99.99% for message processing times to ensure that SLAs are being met. However NonHowever, Non-lossy collection and reporting of histographical latency statistics is a challenging problem in low latency systems due to the number of data points that need to be retained, computed and serialized. For example, imagine an application that is recording latency statistics for messages coming in at a rate of 10k/sec. To accurately compute and report percentiles with a collection period of 10 seconds, this would mean that the application needs to retain at least 100,000 data points per statistic to perform histographical analysis for just one interval! Assuming that the values are double or long values, then one would be looking at ~800Kb per statistic collected. Collecting and computing on such data is hard on processor memory caches and can have a disruptive impact on application processing times. Furthermore, to perform longer term histographical analysis (across multiple collection periods) without losing any data, each set of interval results needs to be stored so that computation can be performed. Persisting such data to disk or emitting it in server heartbeats to achieve this is also problematic because it leads to a large volume of data which puts a strain on disk space and bandwidth, or in the case of heartbeats network heartbeats, network bandwidth when emitted over the messaging fabric. 

...

An HDRHistogram compromises on precision of the captured latencies in favor of cheaper computation and storage of results while still maintaining a predictable precision. The documentation on HDR histogram provides details on the level of precision that is achieved. Practically speaking, however, for latency data points in the 100s of microseconds the precision that is guaranteed for collected percentiles is in the order of +/- 1us, which is acceptable for most applications (for tail values, say in the range of 1 minute, the value is guaranteed to be correct within +/- 60ms).

Creating a Series Stat

Code Block
languagejava
import com.neeve.stats.IStats.Series;
import com.neeve.stats.StatsFactory;
 
@AppStat
private final Series newCustomerAge = StatsFactory.createSeriesStat("New Customer Age");
 
public void MyApp() {
 
  @EventHandler
  public void onNewCustomer(NewCustomerCreation message) {
	newCustomerAge .add(message.getQuantity());
  }  
}

...

In the above we can see that in the in last interval, one new customer registered and their age was 21. Over the last 8 intervals, the average new customer age is 23 with the oldest being 29 and the youngest being 21. 

...

Series data for user stats are exposed in the Server Monitoring Heartbeat using the SrvMonUserSeriersStat object:

...

Field Name
type
Description
nameString

When the server is configured to include the capture data points for the statistic, the returned array will include the values collected during this interval. This allows monitoring tools to perform non-lossy calculation of percentiles, providing new data points were skipped due to under sampling or a missed heartbeat.

Then number of valid values in the returned array is dicated by numDataPoints, ; if the length of the values array is longer than numDataPoints, subsequent values in the array should be ignored.

seriesTypeSrvMonSeriesType

The type of the series data.

Note

Currently only Integer Data series are supported. The types BYTE, SHORT, LONG, FLOAT and DOUBLE are reserved for future use. Processors of heartbeats should ensure that they check the data type here for future proofing.

intSeriesSrvMonIntSeries

The collected int series data for an INT series.

This field should only be set when the series type is set to SrvMonSeriesType.INT.

...

Field Name
type
Description
dataPointsint[]

When the server is configured to include the capture data points for the statistic, the returned array will include the values collected during this interval. This allows monitoring tools to perform non-lossy calculation of percentiles, providing new data points were skipped due to under sampling or a missed heartbeat.
Then
The number of valid values in the returned array is dicated dictated by numDataPoints, ; if the length of the values array is longer than numDataPoints, subsequent values in the array should be ignored.

lastSequenceNumberlong

Sequence numbers for collected data points start at 1, a value of 0 indicates that no data points have been collected.

The Sequence Number always indicates the number or data points that have been collected since the statistic has been created or was last reset. 
If the statistic is reset then this value will reset to 0, when

numDataPointsint

Indicates the number of data points collected in this interval. If no data points were collected, numDataPoints will be 0. 

The sequence number of the first value collected in this interval can be determined by subtracting numDataPoints from lastSequenceNumber. This can be used to determine if two consecutive datapoints have skipped data points due to under sampling or a missing heartbeat.

skippedDataPointslong

The runtime only holds on to a fixed number of data points for any particular Latency statistic. If the sampling interval is too high, then some datapoints may be skipped. For example, let's say Latency stats are configured to hold on to a sample size of 1000 datapoints. If the number of data points being capture captured per second is 2000, and the stats collection interval is 1 second, then on each collection, 1000 datapoints will be missed, which will skew results. 

The skipped data points counter thus indicates how many data points have been missed in the reported runningStats. And if the count grows over two successive heartbeats, this indicates that the values the intervalStats don't reflect all the activity since the last interval.

The skipped data points counter is a running counter ... : it tracks the total number of data points that have been skipped since the underlying statistic was last reset.NEW IN 3.1

SrvMonIntHistogram

intervalStats

Holds computed results for the datapoints captured for this heartbeat (e.g. for the numDataPoints captured).

This field may not be set if numDataPoints is 0 or if interval computations are not done on the server.NEW IN 3.1

SrvMonIntHistogram 

runningStats

Holds computed results for the datapoints over the lifetime of this statistic (e.g. since seqNo 1).

If the underlying statistic is reset then the running stats are also corresponding reset.

...

Field Name
type
Description
sampleSizelong

The number of datapoints over which results were calculated (possibly 0 if no data points were collected).

minimumint

The minimum value recorded in the sample set.

The value is not set if the sample size is 0.

maximumint

The maximum value recorded in the sample set.

The value is not set if the sample size is 0.

meanint

The mean for the values recorded in the sample set.

The value is not set if the sample size is 0.

medianint

The median for the values recorded in the sample set.

The value is not set if the sample size is 0.

pct75int

The 75th percentile for the values recorded in the sample set.

The value is not set if the sample size is 0.

pct90int

The 90th percentile for the values recorded in the sample set.

The value is not set if the sample size is 0.

pct99int

The 99th percentile for the values recorded in the sample set.

The value is not set if the sample size is 0.

pct999int

The 99.9th percentile for the values recorded in the sample set.

The value is not set if the sample size is 0.

pct9999int

The 99.99th percentile for the values recorded in the sample set.

The value is not set if the sample size is 0.

samplesOverMaxlong

The number of samples that exceeded the maximum recordable value for the histogram.

When computing latency percentiles using an HDRHistogram, it is possible that a recorded value will exceed the maximum value allowable. In this case, the datapoint in is downsampled to the maximum recordable value, which skews the percentile calculations lower. SamplesOverMax allows detection of how frequently this is occuringoccurring.

samplesUnderMinlong

The number of samples capture captured that were below the recordable value for the histogram.

When computing latency percentiles using an HDRHistogram, it is possible that a recorded value will be below 0 in cases where clock skew is possible. In such cases, the the value will be upsampled to 0, which can skew the histogram results. SamplesUnderMin allows detection of how frequently this is happening.

...

The AppStat annotation can be used to annotate user defined statistics in the application to allow them those statistics to be discovered by a Talon Server. The Talon server will register each statistic if it finds with the application's AepEngine. AppStat annotations are only introspected once: just after the application's AepEngine is injected. If the application changes the instance after application initialization, the new stat instance won't be discovered by the application. 

...

Any @AppStat annotated field in the main application class will be discovered by the Talon server, : if additional classes in your application contain user defined stats, they can be exposed to the server using the AppStatContainerAccessor annotation:.

Code Block
java
java
    @AppHAPolicy(HAPolicy.EventSourcing)
    public static class MyApp {
        MyOtherClass someOtherClass = new MyOtherClass();

        @AppStatContainersAccessor
        public void getStatContainers(Set<Object> containers) {
            containers.add(someOtherClass );
        }
    }

    private static class MyOtherClass {
        @AppStat
        Counter numHeartbeats = StatsFactory.createCounterStat("Heartbeats Received");
        StatContainer() {
        }
    }

...