In This Section
Overview
Talon's threading model is key to its ability to achieve extreme performance levels. This section discusses the key concepts related to threading and high-performance computing and discusses configuration options to get the most performance out of Talon. Before jumping into this section, be sure to read through The Talon Application Flow. The key architectural pieces to grasp when thinking about threading in Talon micro apps are:
- Single thread writes to application state.
- Pipeline execution of application transactions.
This section discusses these concepts and contains a listing of the platform threads in play in Talon applications.
The Single Writer Principle
Key to performance in Talon is the mandate that write access to application state is single threaded. The single writer principle posits that "when trying to build a highly scalable system the single biggest limitation on scalability is having multiple writers contend for any item of data or resource". It is the main motivator behind architectural patterns such as the actor model and micro-services. One of the major performance advantages of a micro-app architecture is that by making all state private to the application it reduces write contention. By bringing all application state into memory Talon further reduces the cost of updating data by keeping it as close the business logic that is operating on it as possible. But even with all state in main memory, there are significant costs to multiple threads operating on the same piece of data at the processor cache level.
Every Talon application is backed by an AepEngine with a single input multiplexer thread that consumes events and messages coming in from message buses and dispatches them to the application on a single thread that serves as the single writer for an application's state. Application authors thus do not need to concern themselves with synchronization or locking.
As with most architectures, horizontal scalability in Talon can be achieved by partitioning state across instances (whether that by in the same JVM, same machine or multiple machines), but it is usually desirable to reduce the number of shards or eliminate the need for sharding altogether in order to reduce cost and complexity. A single writer architecture is a key component of reducing hardware inefficiency as it avoids wasting processor and memory resources associated with managing inter-thread contention.
Understanding Detached Threads
As discussed above, the Talon application programming model is single threaded. Consequently, it is desirable to keep the application's single business logic thread busy performing application logic as much as possible (as opposed to spending cycles on infrastructural concerns such as replication or persistence). Consequently, the platform provides the ability to do much of this non-functional heavy lifting in background threads that are detached from the business logic.
Work that can be configured to be done in detached threads include:
- Replication (Detached Store Sender, Detached Store Dispatcher)
- Persistence (Detached Persister)
- Intercluster Replication (Detached ICR Sender)
- Message Logging (Detached Inbound / Outbound Message Loggers)
The Listing of Thread section below describes these threads in more detail. For optimal latency and throughput, these threads can also be affinitized to particular CPU cores to reduce jitter hits on throughput from thread context switching (see Tuning Thread Affinitization and NUMA).
Understanding Disruptors
Effective pipelining between threads is key to Talon's performance meaning that inter-thread communication must be optimal. Talon uses lmax disruptors to pass data between critical threads in the processing pipeline. Throughout X configuration you will see configuration used to configure disruptors that looks like:
Knob | Description |
---|---|
queueDepth | The size of the ring buffer. This knob controls the size of the ring buffer. It is best to choose a power of 2 for ring buffer. The buffer should be sized large enough to absorb spikes in application traffic without blocking the offering thread, but otherwise should generally be kept small enough to keep the amount of active data in the pipeline small enough to avoid taxing CPU caches. The default size for most disruptors is 1024. |
queueWaitStrategy | Controls how the thread draining events from the ring buffer waits for more events. One of BusySpin|Yielding|Sleeping|Blocking. For applications that want the lowest latency possible using BusySpin causes the draining thread to spin without signally to the OS that it should be context switched which avoids jitter. This policy is most appropriate when the number of cores available in the machine is adequate for each reader to occupy its own core. Otherwise, a Yielding wait strategy can be used. Both BusySpin and Yielding are CPU intensive and are most appropriate for applications where performance is critical and run on hardware dedicated to the application. |
queueDrainerCpuAffinityMask | Controls the CPU to which to affinitize the draining thread. For BusySpin or Yielding policies, affinitizing threads can further reduce jitter. (see Tuning Thread Affinitization and NUMA). |
queueOfferStrategy | This can be used to override the offer strategy used to manage concurrency that is used when offering elements to the ring buffer. ![]() |
Auto Tuning of Disruptor Wait Strategies.
When the nv.optimizefor environment property is set to latency or throughput disruptors in the critical path are automatically set to BusySpin and Yielding respectively unless explicitly configured as otherwise via configuration.
You can set the environment property
Listing of Threads
Talon Application Threads
Thread | Name | Critical Path | Description |
---|---|---|---|
AEP Engine Input Multiplexer | X-STEMux-<appName>-<instanceid> | Yes | The engine thread that dequeues and dispatch application messages and events. This is the main application thread on which application events are dispatched. They are suffixed with a global counter to allow differentiating between stats emitted by multiple instances of the same application running in the same JVMAep. The detached threads described outlined below can offload work from this thread which can improve throughput and latencies in your application.
|
X-EventMultiplexer-Wakeup-<appName> | No | A timer thread use to wake up and dispatch events scheduled via the engine's input queue. | |
Detached Inbound Message Logger | X-ODS-StoreLog-<appName>.in | No | When the application is configured with a detached inbound message logger, this thread offloads the work of writing to disk from the engine's input multiplexer which can serve as a buffer against disk I/O spikes. As Inbound message loggers aren't used for HA purposes they are not on the critical path, ![]() |
Detached Outbound Message Logger | X-ODS-StoreLog-<appName>.out | No | When the application is configured with a detached outbound message logger, this thread offloads the work of writing to disk from the engine's input multiplexer which can serve as a buffer against disk I/O spikes.
![]() |
Per Transaction Stats Logger | X-ODS-StoreLog-<appName>.txnstats | No | When the application is configured with a detached per transaction stats logger, this thread offloads the work of writing to disk from the engine multiplexer which can serve as a buffer against disk I/O spikes.
|
Bus Threads | Each bus configured for your application is managed by a Bus Manager internal to the AEP engine.
Bus binding instances will generally create additional threads specific to the binding type. | ||
Detached Bus Send Thread | X-AEP-BusManager-IO-<appName>.<busName> | Yes | When the bus is configured for detached send, this thread offloads the work of serialization and writing of outbound messages from the engine's input multiplexer which serves as a buffer against spikes caused by message bus flow control. |
Bus Binding Opener | X-AEP-BusManager-BindingOpener-<appName>. <busName> | No | Each bus configured for your application is managed by a Bus Manager internal to the AEP engine. Handles establishment of the bus connects and reconnects. |
Store Threads | Each store instance will minimally create a replication reader thread which handles reading from peers. | ||
Store Reader Thread | X-ODS-StoreReplicatorLinkReader-<storeName>-<memberName> | Yes | The IO thread for the store which is used to read replication traffic from cluster peers. |
Detached Store Persister | X-ODS-StoreLog-<storeName>-<instanceid> | No | When the store is configured for detached persistence, this thread offloads the work of writing recovery data to disk from the engine's input multiplexer which can serve as a buffer against disk I/O spikes. |
Detached ICR Sender | X-ODS-Store-ICR-Sender-<storeName>-<instanceid> | No | When the store is configured for detached inter-cluster replication, this thread offloads the work of writing recovery data to the receiver from the engine's input multiplexer which can serve as a buffer against disk I/O spikes.![]() |
Detached Store Send Thread | X-ODS-StoreReplicatorSender-<storeName>-<memberName> | Yes | When the store is configured for detached send, this thread offloads the work of writing recovery data to the network for backup instances from the engine's input multiplexer which can serve as a buffer against network I/O spikes.
|
Detached Store Dispatch Thread | X-ODS-StoreReplicatorDispatcher-<storeName>-<memberName> | Yes | When the store is configured for detached dispatch, this thread allows the store reader thread to offload work of dispatching deserialized replication traffic to the engine for processing this is useful in cases where the cost of deserializing replication traffic is high.
|
Store Acceptor Thread | X-ODS-StoreLinkAcceptor-<instanceid> | No | Each store configured for clustering will create a thread that will listen for connection requests from other store members. Once the connection is established it is handed off to the store reader thread for processing. |
Miscellaneous | |||
Stats Printer Threads | X-Stats-Printer [<statName>-<instanceid>.stats] | No | Several components can be configured to trace stats (independently) from XVM collected stats. When such stats threads are enabled a thread is created to periodically print stats. This is typically useful if an app is run outside of an XVM. |
Scheduler | X-Scheduler-<instance-count> | No | A timer thread used for scheduling events. An AepEngine uses this to perform periodic engine health checks, for example. |
Discovery Threads
Thread | Thread Name | Critical Path | Description |
---|---|---|---|
Discovery Timer | X-EDP-Timer | No | Each discovery provider that is opened will create a timer thread that will periodically wake up to perform discovery broadcasts. Each discovery provider typically will create additional threads specific to the discovery provider type. For example, when using an SMA-based discovery provider, message bus binding threads will be created. |
XVM Threads
Thread | Name | Critical Path | |
---|---|---|---|
XVM main thread | X-Server-<xvmName>-Main | No | This thread creates and starts applications at startup and upon completion will drive server acceptors for accepting admin and direct connections to the XVM. |
XVM stats collector | X-Server-<xvmName>-StatsRunner | No | Collects stats for applications, and populates them into heartbeats that can be traced, logged, dispatched, and emitted. |
XVM dedicated IO thread | X-Server-<xvmName>-IOThread-<threadNumber> | Yes | When the server is configured for multithreading, additional IO threads beyond the XVM main thread are created that service connections that are affinitized to it.
|