In This Section
Overview
This page discusses the AepEngine's AepStuckAlertEvent which is emitted in cases where it is determined that the engine is in a state that could prevent processing of further messages. There are two such cases that are reported by the event:
Hung Transaction Pipeline
Engine transactions complete asynchronously in the background after successful return from a message handler. To detect such a case the engine statistics for number of commits started vs the number of commits completed are examined periodically to see if the number of commits completed is less than the number started and hasn't changed for some time. Some possible causes for this situation are that replication to a backup has flow controlled an outbound message bus has flow controlled or an outbound message bus has started dropping acknowledgements.
Hung Message/Event Handler
If a message handler (or other event handler) has hung this will also cause the engine to stop processing messages. To detect this case the engine's statistics for number of events received vs the number of events processed are examined periodically to see if the number of events processed is less than the number of events received and hasn't changed for some time. Some possible causes for this situation are a message handler erroneously sleeping or deadlocked.
Configuration and Usage
To enable stuck alert check, the AepEngineDescriptor.setStuckAlertEventThreshold should be set.
Setting The Threshold
Defines the threshold, in seconds, after which an AepStuckAlertEvent
is dispatched to the application's IAepAsynchronousEventHandler
.
An AepStuckAlertEvent
event is intended to alert that the engine's transaction pipeline is "stuck" i.e. there are one or more transaction commits in the pipeline and/or the engine's event multiplexer thread is not processing any events. For example, the event multiplexer thread could be flow controlled on the replication connection due to an issue in the backup or could be spinning in a loop in the business logic due to a bug in a business logic handler. This event triggered by the AEP engine under the following conditions:
- There are > 0 commits in the engine's commit pipeline
- The number of completed commits in the pipeline has not changed for a configurable period of time
- Stuck alert events are now only emitted if the stuck condition has persisted through two stuck alert threshold checks. This prevents a 'false positve' alert from occurring in the case where an engine has been idle for more than the stuck alert threshold and a new transaction is started just before a stuck engine check is made.
- A message or event handler is hung and preventing the engine's multiplexer thread from processing further events
- Per stuck engine check either a HungTransactionPipeline or HungEventHandler will be reported, not both, and HungTransactionPipeline takes precedence. Stuck alerts are meant to raise a warning that an administrator should investigate engine health, and make a deeper determination of possible root causes. Upon seeing that StuckAlertEvent has been raised, thread dumps should be taken, and stats and trace logs should be examined to determine if there is a serious issue.
AsynchronousEventHandler
StuckAlertEvents aren't dispatched via the engine's regular event handler because in this case the handler itself may be stuck. To receive AepStuckAlertEvents, and application must register an AsynchonousEventHandler with the engine via the setAsynchronousEventHandler method.
AepStuckAlertEvent
The following accessors are available on a AepStuckAlertEvent
Field | Description |
---|---|
Reason | Returns the reason for this alert. Possible values are: HungTransactionPipeline: AepEngine transactions complete asychronously in a pipelined fashion. The engine's pipeline is considered hung when there are oustanding commits that have not completed. HungEventHandler :Indicates that the AepEngine's event processor is hung. The event processor thread handles dispatching of application messages to their handlers along with other internal events. |
Message | An optional descriptive message intended for display in an alert message. |
Engine | The engine that is stuck. |
LastEventProcessedTimestamp | Gets the timestamp in millis of the last event fully processed by the engine's event handler or the time that the first event was received whichever is greater |
LastCommitCompletionTimestamp | Gets the timestamp in millis of the last completed engine transaction or the timestamp that the first event was received whichever is greater |
IncompleteCommitCount | Gets the number of commits that have not been completed. |
Abatement and Repeated Alerts
There currently is no corresponding abatement event for stuck alerts, abatement of a stuck alert condition can be determined by looking at the numCommitsCompleted engine stat, if the number of commits completed is increasing then the stuck condition has abated.
AepStuckAlertEvents are not repeatedly dispatched for a given stuck condition. For a HungTransactionPipeline another stuck alert won't be generated until at least one commit has completed since the last alert. For a HungEventHandler, another stuck alert event won't be triggered until at least one event has been successfully processed since the last alert.