Akka
Lightbend Telemetry is capable of capturing data for the following Akka related features.
All time-related metrics use nanoseconds unless specified otherwise.
Cinnamon Akka module dependency
After adding the Cinnamon Agent as described in the setup, make sure that you add the Cinnamon Akka module dependency to your build file:
- sbt
-
libraryDependencies += Cinnamon.library.cinnamonAkka
- Maven
-
<dependency> <groupId>com.lightbend.cinnamon</groupId> <artifactId>cinnamon-akka_2.13</artifactId> <version>2.20.4</version> </dependency>
- Gradle
-
dependencies { implementation group: 'com.lightbend.cinnamon', name: 'cinnamon-akka_2.13', version: '2.20.4' }
Actor metrics
The following metrics are recorded for instrumented actors, type of metric in parenthesis:
-
Running actors (counter) — the number of running actors (of an actor class or group).
-
Mailbox size (recorder) — statistics for actor mailbox sizes.
-
Stash size (recorder) — statistics for actor stash sizes.
Mailbox size and Stash size are recorders to give the statistical distribution of sizes for aggregated actor metrics (reported by actor class or group) and to provide visibility into what happened over the past time window. Note that the mailbox size and stash size are sampled to avoid skew in the distributions — the sampling period can be configured.
-
Mailbox time (recorder) — statistics for the time that messages are in the mailbox.
-
Dropped messages (rate) — statistics for the number of messages dropped from bounded mailboxes per actor.
-
Processed messages (rate)* — the number of messages that actors have processed in the selected time frame.
-
Processing time (recorder)* — statistics for the processing time of actors.
-
Sent messages (rate)* — statistics for the number of sent messages per actor.
The metrics marked with *
support an automatic message-type dimension. It’s disabled by default because it creates an extra set of metrics for each additional type of message. See enabling message type dimensions for details.
Router metrics
The following router metrics are available:
-
Processed messages (rate)* — the number of messages that routers have processed in the selected time frame.
-
Processing time (recorder)* — statistics for the processing time of the router logic.
Router metrics are only available for router actors, i.e. not available when routers are used directly.
Use the setting routers = off
to disable router metrics from being created, see router settings.
The metrics marked with *
support an automatic message-type dimension. It’s disabled by default because it creates an extra set of metrics for each additional type of message. See enabling message type dimensions for details.
Actor remote metrics
The following remote metrics are recorded for instrumented actors, type of metric in parenthesis:
-
Sent messages (rate)* — statistics for the number sent remote messages.
-
Sent message size (bytes)* — statistics for remote sent message sizes.
-
Serialization time (recorder)* — statistics for the time that serialization takes.
-
Received messages (rate)* — statistics for the number received remote messages.
-
Received message size (bytes)* — statistics for remote received message sizes.
-
Deserialization time (recorder)* — statistics for the time that deserialization takes.
-
Node quarantine (event) — node quarantine event information.
-
Phi accrual value (gauge) — statistics for the Phi accrual failure detector. A Phi value represents the connection between two nodes; self and remote. A self node can have a connection to any number of remote nodes and each connection will have its own Phi value. Note that internally in Akka the Phi accrual value can become Double.Infinity. If this happens Cinnamon will convert this value to 1024*1024. The reason for this is that most visualizers cannot handle infinity. If you therefore see the value 1048576 (1024*1024) this means that the Phi value has reached infinity.
-
Phi accrual threshold value (gauge) — the configured Phi accrual threshold value.
The metrics marked with *
support an automatic message-type dimension. It’s disabled by default because it creates an extra set of metrics for each additional type of message. See enabling message type dimensions for details.
Timing of serialization/deserialization is turned off by default. To enable it, you need to add this setting to your configuration:
cinnamon.akka.remote.serialization-timing = on
Phi accrual metrics and node quarantine events are turned off by default. To enable them, you need to add this setting to your configuration:
cinnamon.akka.remote.failure-detector-metrics = on
Actor selection
Actor configuration supports selecting and grouping actors for instrumentation by actor class, package, subtree, or instance, so that telemetry and metric aggregation can be tailored to the application. Details on how to configure actor telemetry can be found under actor configuration.
Actor events
Out of the ordinary events are automatically recorded for instrumented actors.
-
Actor failure — when an actor fails, i.e. throws an exception. Event information:
actor-ref
— the actor failingcause
— the exception being thrown -
Unhandled message — when an actor does not handle a message sent to it. Event information:
actor-ref
— the actor not handling the messagemessage
— the message not being handledsender
— the sender of the message -
Dead letter — when a message is sent to an actor that no longer exists. Event information:
recipient
— the intended recipient of the messagemessage
— the message being sentactor-ref
— the actor sending the message -
Log warning — when an actor logs a warning. Event information:
actor-ref
— the actor logging the warningwarning
— the warning being logged -
Log error — when an actor logs an error. Event information:
actor-ref
— the actor logging the errorerror
— the error being logged
Cluster events
These are the types of Akka clustering events that Cinnamon observes:
-
Current cluster state event — a one time event, per cluster node, containing information about the state of the cluster.
-
Domain events — cluster domain events like leader changed, role leader changed or cluster shutting down. Event information:
event
— type of domain event, e.g.LeaderChanged
role
— therole
for the domain event if any -
Member events — cluster member events like member up, unreachable, reachable, exited or removed. Event information:
event
— type of domain event, e.g.MemberUp
version
— node versionmember-status
— member status, e.g.Joining
previous-member-status
— previous member status, e.g.Up
-
Singleton events — cluster singleton events with information about node, actor singleton class and name. Event information:
event
— type of singleton event, e.g.started
class
— singleton actor classactor
— singleton actor name -
Shard region events — cluster shard region started/stopped event with information about actor, node, type name and type of shard region (normal/proxy). Event information:
event
— type of shard region event, e.g.STARTED
shard-region-actor
— the actor controlling the shard regiontype-name
— the entity type name of the shard region -
Node unable to join event — cluster node unable to join event with information about seed nodes and number of attempts to join (only available for Akka version >= 2.4.17.)
-
Periodic cluster node status event — if
cinnamon.akka.cluster.node-status
is enabled one event per cluster member with the node status as seen from the emitting node. The events are emitted once on joining the cluster and then periodically. Interval for emitting is configured usingcinnamon.akka.cluster.node-status-interval
, default value is an interval of 30 seconds.
Cluster events are turned off by default. To enable them, you need to add these settings to your configuration:
cinnamon.akka.cluster.domain-events = on
cinnamon.akka.cluster.member-events = on
cinnamon.akka.cluster.node-status = on
cinnamon.akka.cluster.singleton-events = on
cinnamon.akka.cluster.shard-region-info = on
Cluster metrics
The following cluster metrics are recorded, type of metric in parenthesis:
-
Shard region delivered messages (rate) — statistics for the number of messages that have been delivered by the shard region actor (regardless of where the shard resides).
-
Shards regions per node (gauge) — number of shards regions per node.
-
Shards per region (gauge) — number of shards per shard region.
-
Shard entities per shard (gauge) — number of shard entities per shard.
-
Reachable nodes (counter) — number of reachable nodes in the cluster at any point in time (see definition.)
-
Unreachable nodes (counter) — number of unreachable nodes in the cluster at any point in time (see definition.).
Cluster related metrics is turned off by default. To enable it, you need to add these settings to your configuration:
cinnamon.akka.cluster.shard-region-info = on
cinnamon.akka.cluster.node-metrics = on
Sharded actors are created under the /system
guardian actor, rather than the /user
guardian actor. You can select sharded actors by class or by package, or by using a path selection such as /system/sharding/*
. You can also group actors and provide a custom name.
Definition of reachable and unreachable nodes
Reachable and unreachable nodes are based on observations from the failure detector and is orthogonal to the cluster member states. In other words, a member in, for example, status Up
can be either reachable
or unreachable
.
Split Brain Resolver events
Running Split Brain Resolver in your cluster will ensure better resilience. If you have SBR running, Lightbend Telemetry will automatically keep track of any activity therein. If there is a split in your cluster, events will be generated.
Make sure that you add the Cinnamon Akka Cluster module dependency to your build file:
- sbt
-
// Use Akka Cluster instrumentation libraryDependencies += Cinnamon.library.cinnamonAkkaCluster
- Maven
-
<!-- Use Akka Cluster instrumentation --> <dependency> <groupId>com.lightbend.cinnamon</groupId> <artifactId>cinnamon-akka-cluster_2.13</artifactId> <version>2.20.4</version> </dependency>
- Gradle
-
// Use Akka Cluster instrumentation dependencies { implementation group: 'com.lightbend.cinnamon', name: 'cinnamon-akka-cluster_2.13', version: '2.20.4' }
- Split brain resolver events — Split brain resolver decision taken event with information about the decision, reachable and unreachable nodes. Event information:
decision
— type of SBR decision, e.g.DownUnreachable
nodes
— the nodes that this node finds reachableunreachable-nodes
— the nodes that this node finds unreachable
Split brain resolver events are turned off by default. To enable them, you need to add these settings to your configuration:
cinnamon.akka.cluster.split-brain-resolver-events = on
Threshold events
Thresholds can be specified for some of the metrics. If the threshold is exceeded then an event is fired. Thresholds are supported for:
-
Mailbox size — mailbox queue grows too large. Event information:
actor-ref
— the actor whose mailbox size has exceeded the limitmessage
— the message being enqueued in the mailboxsize
— the mailbox sizelimit
— the mailbox size limit -
Stash size — stash queue grows too large. Event information:
actor-ref
— the actor whose stash size has exceeded the limitmessage
— the message being stashedsize
— the stash sizelimit
— the stash size limit -
Mailbox time — message has been in the mailbox for too long. Event information:
actor-ref
— the actor whose mailbox time has exceeded the limitmessage
— the message being dequeued from the mailboxnanos
— the mailbox timethreshold-nanos
— the mailbox time limit -
Processing time — message processing takes too long. Event information:
actor-ref
— the actor whose processing time has exceeded the limitmessage
— the message that was just processednanos
— the processing timethreshold-nanos
— the processing time limit -
Remote large message sent — a message larger than the threshold has been sent Event information:
actor-ref
— the actor who is sending the large messagemessage-class
— the message class of the large messagesize
— the size in bytes of the large messagerecipient
— the recipient of the large message -
Remote large message received — a message larger than the threshold has been received Event information:
actor-ref
— the actor who is receiving the large messagemessage-class
— the message class of the large messagesize
— the size in bytes of the large messagesender
— the sender of the large message
For more information see metric thresholds configuration.
Stopwatch
Stopwatch provides a timer that follows asynchronous flows. A Stopwatch can be started in one actor and then flow through to others via message sends. You can use it to gather time metrics for “hot paths” within message flows that cross multiple actors. Intervals are marked programmatically with start and stop points within the application using an Akka extension Stopwatch API. Time metrics are recorded for Stopwatches and threshold events can be configured. For more details see the Stopwatch extension.
- Stopwatch events — the stopwatch time limit was breached. Event information:
current-nanos
— the current stopwatch time in nanosecondsthreshold-nanos
— the stopwatch threshold time in nanoseconds
Dispatcher metrics
The following metrics can be recorded for instrumented dispatchers, type of metric in parenthesis:
Basic metrics
These are metrics that are built into the standard ForkJoinPool
and ThreadPool
ExecutorService
implementations in Java and Scala. They are polled periodically by the instrumentation.
ForkJoinPool
-
Parallelism — the parallelism setting
-
Pool size (counter) — the current size of the thread pool
-
Active threads (counter) — an estimate of the number of threads running or stealing tasks
-
Running threads (counter) — an estimate of the number of threads not blocked in managed synchronization
-
Queued tasks (counter) — an estimate of the total number of tasks currently in queues
ThreadPool
-
Core pool size (counter) — the minimum size of the thread pool
-
Max pool size (counter) — the maximum size of the thread pool
-
Pool size (counter) — the current size of the thread pool
-
Active threads (counter) — an estimate of the number of threads running tasks
-
Processed tasks (counter) — an estimate of the number of processed tasks
Time metrics
Additional detailed time metrics for dispatchers.
-
Queue size (counter) — the number of tasks waiting to be processed
-
Queue time (recorder) — statistics for how long tasks are in the queue
-
Processing (counter) — how many tasks are being processed right now
-
Processing time (recorder) — statistics for how long the processing takes
All time related metrics use nanoseconds unless specified otherwise.
Dispatcher selection
Dispatcher configuration supports selecting which dispatchers should be instrumented, and what type of instrumentation should be performed for them, so that telemetry can be tailored to the application. Details on how to configure dispatcher telemetry can be found under dispatcher configuration.
Detailed information
For specific information of how to configure actors and dispatchers see:
Ask Pattern metrics
Available since Cinnamon 2.14.0
Akka Classic and Typed provide two ways to interact with actors:
-
tell - send a message asynchronously and return immediately (“fire-and-forget”).
-
ask - send a message asynchronously and receive a future which will complete when the response has been received, or fail by timeout.
The following metrics can be recorded for instrumented actors or entities:
-
Asked requests (rate)* — the number of messages that actors have been asked in the selected time frame.
-
Expired requests (rate)* — the number of messages that haven’t been answered before a specified timeout in the selected time frame.
-
Expired requests by-timeout (rate) — the number of requests that timed out in the selected time frame grouped by specified timeout values.
-
Success response time (recorder)* — the time from the moment the message was sent till the moment when the answer was received and before the specified timeout, counting only successful responses.
-
Late response time (recorder)* — the time from the moment the message was sent till the moment when the answer was received, counting only responses received after timeout.
The metrics marked with *
support an automatic message-type dimension. It’s disabled by default because it requires additional metrics for each additional type of message. See enabling message type dimensions for details.
Ask Pattern events
Available since Cinnamon 2.14.0
-
Request failed — when the actor responses with
akka.actor.Status.Failure
(only Akka Classic) Event information:actor-ref
— the actor being asked the messagemessage
— the message being sentcause
— the exception returned with a failure response -
Request expired — when request expires before the response arrived Event information:
actor-ref
— the actor being asked the messagemessage
— the message being senttimeout
— the given timeout
Ask Pattern selection
Actors and entities are not instrumented for ask-pattern metrics automatically. They must have instrumentation explicitly enabled - see sections below for how to.
Ask Pattern metrics and events will only be collected for the selected classic or typed actors, sharded or persistent entities. Details on how to configure ask pattern telemetry can be found under ask pattern configuration.
Circuit breaker metrics
Available since Cinnamon 2.20.0
It is important to have an understanding of the circuit breakers in Akka in order to interpret the data that is gathered. Among the Akka tools only persistence uses a circuit breaker, to protect the database. Any other usage will be application specific.
The following metrics are recorded for named circuit breakers:
-
State of circuit breaker — 1 open (something is wrong), 2 half-open (things are going from bad to good), 3 closed (everything is fine)
-
Latency (recorder)* — the time from the moment the message was sent till the moment when the answer was received and before the specified timeout. Successful, failed and timed out calls are collected.
-
Throughput (rate)* — the number of calls requested to the breaker in the selected time frame.
-
Success (rate)* — the number of calls that have succeeded within the timeout of the circuit breaker in the selected time frame.
-
Failure (rate) — the number of requests that timed out, failed or were fast-failed by the circuit breaker being open or half-open, in the selected time frame.