Persistence - Building a storage backend
Storage backends for journals and snapshot stores are pluggable in the Akka persistence extension. A directory of persistence journal and snapshot store plugins is available at the Akka Community Projects page, see Community plugins This documentation described how to build a new storage backend.
Applications can provide their own plugins by implementing a plugin API and activating them by configuration. Plugin development requires the following imports:
- Scala
- Java
-
source
import akka.dispatch.Futures; import akka.persistence.*; import akka.persistence.journal.japi.*; import akka.persistence.snapshot.japi.*;
Journal plugin API
A journal plugin extends AsyncWriteJournal
.
AsyncWriteJournal
is an actor and the methods to be implemented are:
- Scala
- Java
-
source
/** * Java API, Plugin API: asynchronously writes a batch (`Iterable`) of persistent messages to the * journal. * * <p>The batch is only for performance reasons, i.e. all messages don't have to be written * atomically. Higher throughput can typically be achieved by using batch inserts of many records * compared to inserting records one-by-one, but this aspect depends on the underlying data store * and a journal implementation can implement it as efficient as possible. Journals should aim to * persist events in-order for a given `persistenceId` as otherwise in case of a failure, the * persistent state may be end up being inconsistent. * * <p>Each `AtomicWrite` message contains the single `PersistentRepr` that corresponds to the * event that was passed to the `persist` method of the `PersistentActor`, or it contains several * `PersistentRepr` that corresponds to the events that were passed to the `persistAll` method of * the `PersistentActor`. All `PersistentRepr` of the `AtomicWrite` must be written to the data * store atomically, i.e. all or none must be stored. If the journal (data store) cannot support * atomic writes of multiple events it should reject such writes with an `Optional` with an * `UnsupportedOperationException` describing the issue. This limitation should also be documented * by the journal plugin. * * <p>If there are failures when storing any of the messages in the batch the returned `Future` * must be completed with failure. The `Future` must only be completed with success when all * messages in the batch have been confirmed to be stored successfully, i.e. they will be * readable, and visible, in a subsequent replay. If there is uncertainty about if the messages * were stored or not the `Future` must be completed with failure. * * <p>Data store connection problems must be signaled by completing the `Future` with failure. * * <p>The journal can also signal that it rejects individual messages (`AtomicWrite`) by the * returned `Iterable<Optional<Exception>>`. The returned `Iterable` must have as many * elements as the input `messages` `Iterable`. Each `Optional` element signals if the * corresponding `AtomicWrite` is rejected or not, with an exception describing the problem. * Rejecting a message means it was not stored, i.e. it must not be included in a later replay. * Rejecting a message is typically done before attempting to store it, e.g. because of * serialization error. * * <p>Data store connection problems must not be signaled as rejections. * * <p>Note that it is possible to reduce number of allocations by caching some result `Iterable` * for the happy path, i.e. when no messages are rejected. * * <p>Calls to this method are serialized by the enclosing journal actor. If you spawn work in * asynchronous tasks it is alright that they complete the futures in any order, but the actual * writes for a specific persistenceId should be serialized to avoid issues such as events of a * later write are visible to consumers (query side, or replay) before the events of an earlier * write are visible. This can also be done with consistent hashing if it is too fine grained to * do it on the persistenceId level. Normally a `PersistentActor` will only have one outstanding * write request to the journal but it may emit several write requests when `persistAsync` is used * and the max batch size is reached. * * <p>This call is protected with a circuit-breaker. */ Future<Iterable<Optional<Exception>>> doAsyncWriteMessages(Iterable<AtomicWrite> messages); /** * Java API, Plugin API: synchronously deletes all persistent messages up to `toSequenceNr`. * * <p>This call is protected with a circuit-breaker. * * @see AsyncRecoveryPlugin */ Future<Void> doAsyncDeleteMessagesTo(String persistenceId, long toSequenceNr);
If the storage backend API only supports synchronous, blocking writes, the methods should be implemented as:
- Scala
- Java
-
source
@Override public Future<Iterable<Optional<Exception>>> doAsyncWriteMessages( Iterable<AtomicWrite> messages) { try { Iterable<Optional<Exception>> result = new ArrayList<Optional<Exception>>(); // blocking call here... // result.add(..) return Futures.successful(result); } catch (Exception e) { return Futures.failed(e); } }
A journal plugin must also implement the methods defined in AsyncRecovery
for replays and sequence number recovery:
- Scala
- Java
-
source
/** * Java API, Plugin API: asynchronously replays persistent messages. Implementations replay a * message by calling `replayCallback`. The returned future must be completed when all messages * (matching the sequence number bounds) have been replayed. The future must be completed with a * failure if any of the persistent messages could not be replayed. * * <p>The `replayCallback` must also be called with messages that have been marked as deleted. In * this case a replayed message's `deleted` method must return `true`. * * <p>The `toSequenceNr` is the lowest of what was returned by {@link * #doAsyncReadHighestSequenceNr} and what the user specified as recovery {@link * akka.persistence.Recovery} parameter. * * @param persistenceId id of the persistent actor. * @param fromSequenceNr sequence number where replay should start (inclusive). * @param toSequenceNr sequence number where replay should end (inclusive). * @param max maximum number of messages to be replayed. * @param replayCallback called to replay a single message. Can be called from any thread. */ Future<Void> doAsyncReplayMessages( String persistenceId, long fromSequenceNr, long toSequenceNr, long max, Consumer<PersistentRepr> replayCallback); /** * Java API, Plugin API: asynchronously reads the highest stored sequence number for the given * `persistenceId`. The persistent actor will use the highest sequence number after recovery as * the starting point when persisting new events. This sequence number is also used as * `toSequenceNr` in subsequent call to [[#asyncReplayMessages]] unless the user has specified a * lower `toSequenceNr`. * * @param persistenceId id of the persistent actor. * @param fromSequenceNr hint where to start searching for the highest sequence number. */ Future<Long> doAsyncReadHighestSequenceNr(String persistenceId, long fromSequenceNr);
A journal plugin can be activated with the following minimal configuration:
source# Path to the journal plugin to be used
akka.persistence.journal.plugin = "my-journal"
# My custom journal plugin
my-journal {
# Class name of the plugin.
class = "docs.persistence.MyJournal"
# Dispatcher for the plugin actor.
plugin-dispatcher = "akka.actor.default-dispatcher"
}
The journal plugin instance is an actor so the methods corresponding to requests from persistent actors are executed sequentially. It may delegate to asynchronous libraries, spawn futures, or delegate to other actors to achieve parallelism.
The journal plugin class must have a constructor with one of these signatures:
- constructor with one
com.typesafe.config.Config
parameter and aString
parameter for the config path - constructor with one
com.typesafe.config.Config
parameter - constructor without parameters
The plugin section of the actor system’s config will be passed in the config constructor parameter. The config path of the plugin is passed in the String
parameter.
The plugin-dispatcher
is the dispatcher used for the plugin actor. If not specified, it defaults to akka.actor.default-dispatcher
.
Don’t run journal tasks/futures on the system default dispatcher, since that might starve other tasks.
Snapshot store plugin API
A snapshot store plugin must extend the SnapshotStore
actor and implement the following methods:
- Scala
- Java
-
source
/** * Java API, Plugin API: asynchronously loads a snapshot. * * @param persistenceId id of the persistent actor. * @param criteria selection criteria for loading. */ Future<Optional<SelectedSnapshot>> doLoadAsync( String persistenceId, SnapshotSelectionCriteria criteria); /** * Java API, Plugin API: asynchronously saves a snapshot. * * @param metadata snapshot metadata. * @param snapshot snapshot. */ Future<Void> doSaveAsync(SnapshotMetadata metadata, Object snapshot); /** * Java API, Plugin API: deletes the snapshot identified by `metadata`. * * @param metadata snapshot metadata. */ Future<Void> doDeleteAsync(SnapshotMetadata metadata); /** * Java API, Plugin API: deletes all snapshots matching `criteria`. * * @param persistenceId id of the persistent actor. * @param criteria selection criteria for deleting. */ Future<Void> doDeleteAsync(String persistenceId, SnapshotSelectionCriteria criteria);
A snapshot store plugin can be activated with the following minimal configuration:
source# Path to the snapshot store plugin to be used
akka.persistence.snapshot-store.plugin = "my-snapshot-store"
# My custom snapshot store plugin
my-snapshot-store {
# Class name of the plugin.
class = "docs.persistence.MySnapshotStore"
# Dispatcher for the plugin actor.
plugin-dispatcher = "akka.actor.default-dispatcher"
}
The snapshot store instance is an actor so the methods corresponding to requests from persistent actors are executed sequentially. It may delegate to asynchronous libraries, spawn futures, or delegate to other actors to achieve parallelism.
The snapshot store plugin class must have a constructor with one of these signatures:
- constructor with one
com.typesafe.config.Config
parameter and aString
parameter for the config path - constructor with one
com.typesafe.config.Config
parameter - constructor without parameters
The plugin section of the actor system’s config will be passed in the config constructor parameter. The config path of the plugin is passed in the String
parameter.
The plugin-dispatcher
is the dispatcher used for the plugin actor. If not specified, it defaults to akka.actor.default-dispatcher
.
Don’t run snapshot store tasks/futures on the system default dispatcher, since that might starve other tasks.
Plugin TCK
In order to help developers build correct and high quality storage plugins, we provide a Technology Compatibility Kit (TCK for short).
The TCK is usable from Java as well as Scala projects. To test your implementation (independently of language) you need to include the akka-persistence-tck dependency:
- sbt
val AkkaVersion = "2.7.1" libraryDependencies += "com.typesafe.akka" %% "akka-persistence-tck" % AkkaVersion
- Maven
- Gradle
To include the Journal TCK tests in your test suite simply extend the provided JavaJournalSpec
:
- Scala
- Java
-
source
@RunWith(JUnitRunner.class) class MyJournalSpecTest extends JavaJournalSpec { public MyJournalSpecTest() { super( ConfigFactory.parseString( "akka.persistence.journal.plugin = " + "\"akka.persistence.journal.leveldb-shared\"")); } @Override public CapabilityFlag supportsRejectingNonSerializableObjects() { return CapabilityFlag.off(); } }
Please note that some of the tests are optional, and by overriding the supports...
methods you give the TCK the needed information about which tests to run. You can implement these methods using the provided CapabilityFlag.on
/ CapabilityFlag.off
values.
We also provide a simple benchmarking class JavaJournalPerfSpec
which includes all the tests that JavaJournalSpec
has, and also performs some longer operations on the Journal while printing its performance stats. While it is NOT aimed to provide a proper benchmarking environment it can be used to get a rough feel about your journal’s performance in the most typical scenarios.
In order to include the SnapshotStore
TCK tests in your test suite extend the SnapshotStoreSpec
:
- Scala
- Java
-
source
@RunWith(JUnitRunner.class) class MySnapshotStoreTest extends JavaSnapshotStoreSpec { public MySnapshotStoreTest() { super( ConfigFactory.parseString( "akka.persistence.snapshot-store.plugin = " + "\"akka.persistence.snapshot-store.local\"")); } }
In case your plugin requires some setting up (starting a mock database, removing temporary files etc.) you can override the beforeAll
and afterAll
methods to hook into the tests lifecycle:
- Scala
- Java
-
source
@RunWith(JUnitRunner.class) class MyJournalSpecTest extends JavaJournalSpec { List<File> storageLocations = new ArrayList<File>(); public MyJournalSpecTest() { super( ConfigFactory.parseString( "persistence.journal.plugin = " + "\"akka.persistence.journal.leveldb-shared\"")); Config config = system().settings().config(); storageLocations.add( new File(config.getString("akka.persistence.journal.leveldb.dir"))); storageLocations.add( new File(config.getString("akka.persistence.snapshot-store.local.dir"))); } @Override public CapabilityFlag supportsRejectingNonSerializableObjects() { return CapabilityFlag.on(); } @Override public void beforeAll() { for (File storageLocation : storageLocations) { FileUtils.deleteRecursively(storageLocation); } super.beforeAll(); } @Override public void afterAll() { super.afterAll(); for (File storageLocation : storageLocations) { FileUtils.deleteRecursively(storageLocation); } } }
We highly recommend including these specifications in your test suite, as they cover a broad range of cases you might have otherwise forgotten to test for when writing a plugin from scratch.
Corrupt event logs
If a journal can’t prevent users from running persistent actors with the same persistenceId
concurrently it is likely that an event log will be corrupted by having events with the same sequence number.
It is recommended that journals should still deliver these events during recovery so that a replay-filter
can be used to decide what to do about it in a journal agnostic way.