Rolling Updates
There are a few instances when a full cluster restart is required versus being able to do a rolling update.
A rolling update is the process of replacing one version of the system with another without downtime. The changes can be new code, changed dependencies such as new Akka version, or modified configuration.
In Akka, rolling updates are typically used for a stateful Akka Cluster where you can’t run two separate clusters in parallel during the update, for example in blue green deployments.
For rolling updates related to Akka dependency version upgrades and the migration guides, please see Rolling Updates and Akka versions
Serialization Compatibility
There are two parts of Akka that need careful consideration when performing an rolling update.
- Compatibility of remote message protocols. Old nodes may send messages to new nodes and vice versa.
- Serialization format of persisted events and snapshots. New nodes must be able to read old data, and during the update old nodes must be able to read data stored by new nodes.
There are many more application specific aspects for serialization changes during rolling updates to consider. For example based on the use case and requirements, whether to allow dropped messages or tear down the TCP connection when the manifest is unknown. When some message loss during a rolling update is acceptable versus a full shutdown and restart, assuming the application recovers afterwards * If a java.io.NotSerializableException
is thrown in fromBinary
this is treated as a transient problem, the issue logged and the message is dropped * If other exceptions are thrown it can be an indication of corrupt bytes from the underlying transport, and the connection is broken
For more zero-impact rolling updates, it is important to consider a strategy for serialization that can be evolved. One approach to retiring a serializer without downtime is described in two rolling update steps to switch to the new serializer. Additionally you can find advice on Persistence - Schema Evolution which also applies to remote messages when deploying with rolling updates.
Cluster Sharding
During a rolling update, sharded entities receiving traffic may be moved, based on the pluggable allocation strategy and settings. When an old node is stopped the shards that were running on it are moved to one of the other remaining nodes in the cluster when messages are sent to those shards.
To make rolling updates as smooth as possible there is a configuration property that defines the version of the application. This is used by rolling update features to distinguish between old and new nodes. For example, the default LeastShardAllocationStrategy
avoids allocating shards to old nodes during a rolling update. The LeastShardAllocationStrategy
sees that there is rolling update in progress when there are members with different configured app-version
.
To make use of this feature you need to define the app-version
and increase it for each rolling update.
akka.cluster.app-version = 1.2.3
To understand which is old and new it compares the version numbers using normal conventions, see Version
Version
for more details.
When using Kubernetes Deployments with RollingUpdate
strategy you should enable the app-version from Deployment feature from Akka Management to automatically define the app-version
from the Kubernetes deployment.kubernetes.io/revision
annotation.
Rebalance is also disabled during rolling updates, since shards from stopped nodes are anyway supposed to be started on new nodes. Messages to shards that were stopped on the old nodes will allocate corresponding shards on the new nodes, without waiting for rebalance actions.
You should also enable the health check for Cluster Sharding if you use Akka Management. The readiness check will delay incoming traffic to the node until Sharding has been initialized and can accept messages.
The ShardCoordinator
is itself a cluster singleton. To minimize downtime of the shard coordinator, see the strategies about ClusterSingleton rolling updates below.
A few specific changes to sharding configuration require a full cluster restart.
Cluster Singleton
Cluster singletons are always running on the oldest node. To avoid moving cluster singletons more than necessary during a rolling update, it is recommended to upgrade the oldest node last. This way cluster singletons are only moved once during a full rolling update. Otherwise, in the worst case cluster singletons may be migrated from node to node which requires coordination and initialization overhead several times.
When using Kubernetes Deployments with RollingUpdate
strategy you should enable the Kubernetes Rolling Updates feature from Akka Management to delete pods in the preferred order.
Cluster Shutdown
Graceful shutdown
For rolling updates it is best to leave the Cluster gracefully via Coordinated Shutdown, which will run automatically on SIGTERM, when the Cluster node sees itself as Exiting
. Environments such as Kubernetes send a SIGTERM, however if the JVM is wrapped with a script ensure that it forwards the signal. Graceful shutdown of Cluster Singletons and Cluster Sharding similarly happen automatically.
Ungraceful shutdown
In case of network failures it may still be necessary to set the node’s status to Down in order to complete the removal. Cluster Downing details downing nodes and downing providers. Split Brain Resolver can be used to ensure the cluster continues to function during network partitions and node failures. For example if there is an unreachability problem Split Brain Resolver would make a decision based on the configured downing strategy.
Configuration Compatibility Checks
During rolling updates the configuration from existing nodes should pass the Cluster configuration compatibility checks. For example, it is possible to migrate Cluster Sharding from Classic to Typed Actors in a rolling update using a two step approach as of Akka version 2.5.23
:
- Deploy with the new nodes set to
akka.cluster.configuration-compatibility-check.enforce-on-join = off
and ensure all nodes are in this state - Deploy again and with the new nodes set to
akka.cluster.configuration-compatibility-check.enforce-on-join = on
.
Full documentation about enforcing these checks on joining nodes and optionally adding custom checks can be found in
Akka Cluster configuration compatibility checks.
Rolling Updates and Migrating Akka
From Java serialization to Jackson
If you are migrating from Akka 2.5 to 2.6, and use Java serialization you can replace it with, for example, the new Serialization with Jackson and still be able to perform a rolling updates without bringing down the entire cluster.
The procedure for changing from Java serialization to Jackson would look like:
- Rolling update from 2.5.24 (or later) to 2.6.0
- Use config
akka.actor.allow-java-serialization=on
. - Roll out the change.
- Java serialization will be used as before.
- This step is optional and you could combine it with next step if you like, but could be good to make one change at a time.
- Use config
- Rolling update to support deserialization but not enable serialization
- Change message classes by adding the marker interface and possibly needed annotations as described in Serialization with Jackson.
- Test the system with the new serialization in a new test cluster (no rolling update).
- Remove the binding for the marker interface in
akka.actor.serialization-bindings
, so that Jackson is not used for serialization (toBinary) yet. - Configure
akka.serialization.jackson.allowed-class-prefix=["com.myapp"]
- This is needed for Jackson deserialization when the
serialization-bindings
isn’t defined. - Replace
com.myapp
with the name of the root package of your application to trust all classes.
- This is needed for Jackson deserialization when the
- Roll out the change.
- Java serialization is still used, but this version is prepared for next roll out.
- Rolling update to enable serialization with Jackson.
- Add the binding to the marker interface in
akka.actor.serialization-bindings
to the Jackson serializer. - Remove
akka.serialization.jackson.allowed-class-prefix
. - Roll out the change.
- Old nodes will still send messages with Java serialization, and that can still be deserialized by new nodes.
- New nodes will send messages with Jackson serialization, and old node can deserialize those because they were prepared in previous roll out.
- Add the binding to the marker interface in
- Rolling update to disable Java serialization
- Remove
allow-java-serialization
config, to use the defaultallow-java-serialization=off
. - Remove
warn-about-java-serializer-usage
config if you had changed that, to use the defaultwarn-about-java-serializer-usage=on
. - Roll out the change.
- Remove
A similar approach can be used when changing between other serializers, for example between Jackson and Protobuf.
Akka Typed with Receptionist or Cluster Receptionist
If you are migrating from Akka 2.5 to 2.6, and use the Receptionist
or Cluster Receptionist
with Akka Typed, during a rolling update information will not be disseminated between 2.5 and 2.6 nodes. However once all old nodes have been phased out during the rolling update it will work properly again.
When Shutdown Startup Is Required
There are a few instances when a full shutdown and startup is required versus being able to do a rolling update.
Cluster Sharding configuration change
If you need to change any of the following aspects of sharding it will require a full cluster restart versus a rolling update:
- The
extractShardId
function - The role that the shard regions run on
- The persistence mode - It’s important to use the same mode on all nodes in the cluster
- The
number-of-shards
- Note: changing the number of nodes in the cluster does not require changing the number of shards.
Cluster configuration change
- A full restart is required if you change the SBR strategy
Migrating from PersistentFSM to EventSourcedBehavior
If you’ve migrated from PersistentFSM
to EventSourcedBehavior
(See the Akka 2.8 migration guide) and are using PersistenceFSM with Cluster Sharding, a full shutdown is required as shards can move between new and old nodes.
Migrating from classic remoting to Artery
If you’ve migrated from classic remoting to Artery which has a completely different protocol, a rolling update is not supported. For more details on this migration see the migration guide.
Changing remoting transport
Rolling update is not supported when changing the remoting transport.
Migrating from Classic Sharding to Typed Sharding
If you have been using classic sharding it is possible to do a rolling update to typed sharding using a 3 step procedure. The steps along with example commits are detailed in this sample PR