Akka Multi-DC Persistence

This chapter describes how Akka Persistence can be used across multiple data centers (DC), availability zones or regions.

Warning

This module has been replaced in open source Akka with Replicated Event Sourcing.

Warning

This module is currently marked as May Change in the sense of that the API might be changed based on feedback from initial usage. However, the module is ready for usage in production and we will not break serialization format of messages or stored data.

Note

This feature is included in a subscription to Lightbend Platform, which includes other technology enhancements, monitoring and telemetry, and one-to-one support from the expert engineers behind Akka.

Akka Persistence basics

The reference documentation describes all details of Akka Persistence but here is a short summary in case you are not familiar with the concepts.

Akka persistence enables stateful actors to persist their internal state so that it can be recovered when an actor is started, restarted after a JVM crash or by a supervisor, or migrated in a cluster. The key concept behind Akka persistence is that only changes to an actor’s internal state are persisted but never its current state directly (except for optional snapshots). Such stateful actors are recovered by replaying stored changes to these actors from which they can rebuild internal state.

This design of capturing all changes as domain events, which are immutable facts of things that have happened, is known as event sourcing

Akka persistence supports event sourcing with the PersistentActor traitAbstractPersistentActor abstract class. An actor that extends this traitclass uses the persist method to persist and handle events. The behavior of a PersistentActoran AbstractPersistentActor is defined by implementing receiveRecovercreateReceiveRecover and receiveCommandcreateReceive. More details and examples can be found in the Akka documentation.

Another excellent article about “thinking in Events” is Events As First-Class Citizens by Randy Shoup. It is a short and recommended read if you’re starting developing Events based applications.

Motivation

There can be many reasons for using more than one data center, such as:

Redundancy to tolerate failures in one location and still be operational.
Serve requests from a location near the user to provide better responsiveness.
Balance the load over many servers.

Akka Persistence is using event sourcing that is based on the single writer principle, which means that there can only be one active instance of a PersistentActor with a given persistenceId. Otherwise, multiple instances would store interleaving events based on different states, and when these events would later be replayed it would not be possible to reconstruct the correct state.

This restriction means that the single persistent actor can only live in one data center and would not be available during network partitions between the data centers. It is difficult to safely fail over the persistent actor from one data center to the other because:

The underlying data store might not have replicated all data when network partition occured, meaning that some updates would be lost if starting the persistent actor in the other data center. It would be even more problematic if the data is later replicated when the network partition heals, resulting in similar problems as with multiple active persistent actors.
To avoid above problem with lost or delayed data one could write all data with QUORUM consistency level across all data centers, but that would be very slow.
Detecting problem and failing over to another data center takes rather long time if it should be done with high confidence. Using ordinary Cluster Sharding and Split Brain Resolver would mean downing all nodes in a data center, which is likely not desired. Instead, one would typically like to wait until the network partition heals and accept that communication between the data centers is not possible in the meantime.

Approach

What if we could relax the single writer principle and allow persistent actors to be used in an active-active mode? The consistency boundary that we get from the ordinary persistent actor is nice and we would like to keep that within a data center, but network partitions across different data centers should not reduce availability. In other words, we would like one persistent actor instance in each data center and the persisted events should be replicated across the data centers with eventual consistency. Eventually, all events will be consumed by replicas in other data centers.

This new type of persistent replicated actor is called ReplicatedEntity.

When there is no network partitions and no concurrent writes the events stored by a ReplicatedEntity in one data center can be replicated and consumed by another (corresponding) instance in another data center without any concerns. Such replicated events can simply be applied to the local state.

images/replicated-events1.png

The interesting part begins when there are concurrent writes by ReplicatedEntity instances in different data centers. That is more likely to happen when there is a network partition, but it can also happen when there are no network issues. They simply write at the “same time” before the events from the other side have been replicated and consumed.

images/replicated-events2.png

The ReplicatedEntity has support for resolving such conflicts but in the end the logic for applying events to the state of the entity must be aware of that such concurrent updates can occur and it must be modeled to handle such conflicts. This means that it should typically have the same characteristics as a Conflict Free Replicated Data Type (CRDT). With a CRDT there are by definition no conflicts and the events can just be applied. The library provides some general purpose CRDTs, but the logic of how to apply events can also be defined by an application specific function.

For example, sometimes it’s enough to use application specific timestamps to decide which update should win.

Strategies for resolving conflicts are described in detail later in this documentation.

To be able to support these things the ReplicatedEntity has a different API than the PersistentActor in Akka Persistence. The concepts should be familiar and migrating between the APIs should not be difficult. Events stored by a PersistentActor can be read by a ReplicatedEntity, meaning that it’s possible to migrate an existing application to use this feature. There are also migration paths back to PersistentActor if that would be needed. The API is similar to Lagom’s PersistentEntity, but it has the full power of an Actor if needed.

The solution is using existing infrastructure for persistent actors and Akka persistence plugins, meaning that much of it has been battle tested.

Cassandra is currently the only supported data store, but the solution is designed to allow for other future implementations.

The replication mechanism of the events is taking advantage of the multi data center support that exists in Cassandra, i.e. the data is replicated by Cassandra.

When to not use it

Akka Multi-DC Persistence is not suitable for:

When all you need is a simple CRUD with last-writer wins, or optimistic locking semantics. Event sourcing and Multi-DC event sourcing is then overkill for the problem you are trying to solve and will increase complexity of the solution.
When you need to ensure global constraints at all times. For example ensuring that an inventory balance is never negative even if updated from several data centers. Then you need a fully consistent system and Multi-DC Persistence is favoring availability.
When read-modify-write transactions across several data centers are needed.

Dependency

To use the multi data center persistence feature a dependency on the akka-persistence-multi-dc artifact must be added.

sbt

// Add Lightbend Platform to your build as documented at https://developer.lightbend.com/docs/lightbend-platform/introduction/getting-started/subscription-and-credentials.html
"com.lightbend.akka" %% "akka-persistence-multi-dc" % "1.1.16"

Gradle

// Add Lightbend Platform to your build as documented at https://developer.lightbend.com/docs/lightbend-platform/introduction/getting-started/subscription-and-credentials.html
dependencies {
  compile group: 'com.lightbend.akka', name: 'akka-persistence-multi-dc_2.11', version: '1.1.16'
}

Maven

<!-- Add Lightbend Platform to your build as documented at https://developer.lightbend.com/docs/lightbend-platform/introduction/getting-started/subscription-and-credentials.html -->
<dependency>
  <groupId>com.lightbend.akka</groupId>
  <artifactId>akka-persistence-multi-dc_2.11</artifactId>
  <version>1.1.16</version>
</dependency>

Before you can access this library, you’ll need to configure the Lightbend repository and credentials in your build.

To use it together with Akka 2.6 you have to override the following Akka dependencies by defining them explicitly in your build and define the Akka version to one that you are using.

sbt

libraryDependencies ++= Seq(
  "com.typesafe.akka" % "akka-persistence-query" % "2.6.4",
  "com.typesafe.akka" % "akka-persistence" % "2.6.4",
  "com.typesafe.akka" % "akka-cluster-sharding" % "2.6.4",
  "com.typesafe.akka" % "akka-cluster-tools" % "2.6.4"
)

Maven

<dependency>
  <groupId>com.typesafe.akka</groupId>
  <artifactId>akka-persistence-query</artifactId>
  <version>2.6.4</version>
</dependency>
<dependency>
  <groupId>com.typesafe.akka</groupId>
  <artifactId>akka-persistence</artifactId>
  <version>2.6.4</version>
</dependency>
<dependency>
  <groupId>com.typesafe.akka</groupId>
  <artifactId>akka-cluster-sharding</artifactId>
  <version>2.6.4</version>
</dependency>
<dependency>
  <groupId>com.typesafe.akka</groupId>
  <artifactId>akka-cluster-tools</artifactId>
  <version>2.6.4</version>
</dependency>

Gradle

dependencies {
  compile group: 'com.typesafe.akka', name: 'akka-persistence-query', version: '2.6.4',
  compile group: 'com.typesafe.akka', name: 'akka-persistence', version: '2.6.4',
  compile group: 'com.typesafe.akka', name: 'akka-cluster-sharding', version: '2.6.4',
  compile group: 'com.typesafe.akka', name: 'akka-cluster-tools', version: '2.6.4'
}

Getting started

A template project is available as Get Started download
for Java or for Scala. It contains instructions of how to run it in the README file.

ReplicatedEntity stub

This is what a ReplicatedEntity class looks like before filling in the implementation details:

Akka Multi-DC Persistence

Akka Persistence basics

Motivation

Approach

When to not use it

Dependency

Getting started

ReplicatedEntity stub

Command Handlers

Event Handlers

State

Changing Behavior

Minimum configuration

Running the entity

Resolving conflicting updates

Conflict Free Replicated Data Types

Last writer wins

Additional information about the events

Detecting concurrent updates

Side effects

Triggers

Failures

Snapshots

Passivating and stopping entities

Tagging Events

Testing

How it works

Storage and replication

Causal delivery order

Concurrent updates

Hot-standby

Speculative Replication Optimization

Custom CRDT implementation

Migration from/to PersistentActor

PersistentActor to ReplicatedEntity

ReplicatedEntity to PersistentActor

Configuration

Defining the data centers

API docs