Journal plugin

The journal plugin enables storing and loading events for event sourced persistent actors.

Tables

The journal plugin requires an event journal table to be created in DynamoDB. The default table name is event_journal and this can be configured (see the reference configuration for all settings). The table should be created with the following attributes and key schema:

Attribute name Attribute type Key type
pid S (String) HASH
seq_nr N (Number) RANGE

Read capacity units should be based on expected entity recoveries. Write capacity units should be based on expected rates for persisting events.

An example aws CLI command for creating the event journal table:

sourceaws dynamodb create-table \
  --table-name event_journal \
  --attribute-definitions \
      AttributeName=pid,AttributeType=S \
      AttributeName=seq_nr,AttributeType=N \
  --key-schema \
      AttributeName=pid,KeyType=HASH \
      AttributeName=seq_nr,KeyType=RANGE \
  --provisioned-throughput \
      ReadCapacityUnits=5,WriteCapacityUnits=5 \

Indexes

If queries or projections are being used, then a global secondary index needs to be added to the event journal table, to index events by slice. The default name (derived from the configured table name) for the secondary index is event_journal_slice_idx and may be explicitly set (see the reference configuration). The following attribute definitions should be added to the event journal table, with key schema for the event journal slice index:

Attribute name Attribute type Key type
entity_type_slice S (String) HASH
ts N (Number) RANGE

Write capacity units for the index should be aligned with the event journal. Read capacity units should be based on expected queries.

An example aws CLI command for creating the event journal table and slice index:

sourceaws dynamodb create-table \
  --table-name event_journal \
  --attribute-definitions \
      AttributeName=pid,AttributeType=S \
      AttributeName=seq_nr,AttributeType=N \
      AttributeName=entity_type_slice,AttributeType=S \
      AttributeName=ts,AttributeType=N \
  --key-schema \
      AttributeName=pid,KeyType=HASH \
      AttributeName=seq_nr,KeyType=RANGE \
  --provisioned-throughput \
      ReadCapacityUnits=5,WriteCapacityUnits=5 \
  --global-secondary-indexes \
    '[
       {
         "IndexName": "event_journal_slice_idx",
         "KeySchema": [
           {"AttributeName": "entity_type_slice", "KeyType": "HASH"},
           {"AttributeName": "ts", "KeyType": "RANGE"}
         ],
         "Projection": {
           "ProjectionType": "ALL"
         },
         "ProvisionedThroughput": {
           "ReadCapacityUnits": 5,
           "WriteCapacityUnits": 5
         }
      }
    ]'

Creating tables locally

For creating tables with DynamoDB local for testing, see the CreateTables utility.

Configuration

To enable the journal plugin to be used by default, add the following line to your Akka application.conf:

akka.persistence.journal.plugin = "akka.persistence.dynamodb.journal"

It can also be enabled with the journalPluginId for a specific EventSourcedBehavior and multiple plugin configurations are supported.

Reference configuration

The following can be overridden in your application.conf for the journal specific settings:

sourceakka.persistence.dynamodb {
  journal {
    class = "akka.persistence.dynamodb.journal.DynamoDBJournal"

    # name of the table to use for events
    table = "event_journal"

    # Name of global secondary index to support queries and/or projections.
    # "" is the default and denotes an index named "${table}_slice_idx"
    # (viz. when table (see above) is "event_journal", the GSI will be
    # "event_journal_slice_idx").  If for some reason an alternative GSI name
    # is required, set that GSI name explicitly here; if set explicitly, this
    # name will be used unmodified
    by-slice-idx = ""

    # Set this to off to disable publishing of events as Akka messages to running
    # eventsBySlices queries.
    # Tradeoff is more CPU and network resources that are used. The events
    # must still be retrieved from the database, but at a lower polling frequency,
    # because delivery of published messages are not guaranteed.
    # When this feature is enabled it will measure the throughput and automatically
    # disable/enable if the throughput exceeds the configured threshold. See
    # publish-events-dynamic configuration.
    publish-events = on

    # When publish-events is enabled it will measure the throughput and automatically
    # disable/enable if the throughput exceeds the configured threshold.
    # This configuration cannot be defined per journal, but is global for the ActorSystem.
    publish-events-dynamic {
      # If exponentially weighted moving average of measured throughput exceeds this
      # threshold publishing of events is disabled. It is enabled again when lower than
      # the threshold.
      throughput-threshold = 400
      # The interval of the throughput measurements.
      throughput-collect-interval = 10 seconds
    }

    # Group the slices for an entity type into this number of topics. Most efficient is to use
    # the same number as number of projection instances. If configured to less than the number of
    # projection instances the overhead is that events will be sent more than once and discarded
    # on the destination side. If configured to more than the number of projection instances
    # the events will only be sent once but there is a risk of exceeding the limits of number
    # of topics that PubSub can handle (e.g. OversizedPayloadException).
    # Must be between 1 and 1024 and a whole number divisor of 1024 (number of slices).
    # This configuration can be changed in a rolling update, but there might be some events
    # that are not delivered via the pub-sub path and instead delivered later by the queries.
    # This configuration cannot be defined per journal, but is global for the ActorSystem.
    publish-events-number-of-topics = 128

    # replay filter not needed for this plugin
    replay-filter.mode = off

  }
}

Event serialization

The events are serialized with Akka Serialization and the binary representation is stored in the event_payload column together with information about what serializer that was used in the event_ser_id and event_ser_manifest columns.

Retryable errors

When persisting events, any DynamoDB errors that are considered retryable, such as when provisioned throughput capacity is exceeded, will cause events to be rejected rather than marked as a journal failure. A supervision strategy for EventRejectedException failures can then be added to EventSourcedBehaviors, so that entities can be resumed on these retryable errors rather than stopped or restarted.

Deletes

The journal supports deletes through hard deletes, which means that journal entries are actually deleted from the database. There is no materialized view with a copy of the event, so make sure to not delete events too early if they are used from projections or queries. A projection can also start or continue from a snapshot, and then events can be deleted before the snapshot.

For each persistent id, a tombstone record is kept in the event journal when all events of a persistence id have been deleted. The reason for the tombstone record is to keep track of the latest sequence number so that subsequent events don’t reuse the same sequence numbers that have been deleted.

See the EventSourcedCleanup tool for more information about how to delete events, snapshots, and tombstone records.

Time to Live (TTL)

Rather than deleting items immediately, an expiration timestamp can be set on events or snapshots. DynamoDB’s Time to Live (TTL) feature can then be enabled, to automatically delete items after they have expired.

The TTL attribute to use for the journal or snapshot tables is named expiry.

Time-to-live settings are configured per entity type. The entity type can also be matched by prefix by using a * at the end of the key.

If events are being deleted on snapshot, the journal can be configured to instead set an expiry time for the deleted events, given a time-to-live duration to use. For example, deleted events can be configured to expire in 7 days, rather than being deleted immediately, for a particular entity type:

sourceakka.persistence.dynamodb.time-to-live {
  event-sourced-entities {
    "some-entity-type" {
      use-time-to-live-for-deletes = 7 days
    }
  }
}

While it is recommended to keep all events in an event sourced system, so that new projections can be re-built, setting a time to live expiry on events or snapshots when they are created and stored is supported. For example, events can be configured to expire in 3 days and snapshots in 5 days, for all entity type names that start with a particular prefix:

sourceakka.persistence.dynamodb.time-to-live {
  event-sourced-entities {
    "entity-type-*" {
      event-time-to-live = 3 days
      snapshot-time-to-live = 5 days
    }
  }
}

The EventSourcedCleanup tool can also be used to set an expiration timestamp on events or snapshots.

An expiry marker is kept in the event journal when all events for a persistence id have been marked for expiration, in the same way that a tombstone record is used for hard deletes. This expiry marker keeps track of the latest sequence number so that subsequent events don’t reuse the same sequence numbers for events that have expired.

In case persistence ids will be reused with possibly expired events or snapshots, there is a check-expiry feature enabled by default, where expired events or snapshots are treated as already deleted when replaying from the journal. This enforces expiration before DynamoDB Time to Live may have actually deleted the data, and protects against partially deleted data.

Time to Live reference configuration

The following can be overridden in your application.conf for the time-to-live specific settings:

sourceakka.persistence.dynamodb {
  # Time to Live (TTL) settings
  time-to-live {
    event-sourced-defaults {
      # Whether to check the expiry of events or snapshots and treat as already deleted when replaying.
      # This enforces expiration before DynamoDB Time to Live may have actually deleted the data.
      check-expiry = on

      # Set a time-to-live duration in place of deletes when events or snapshots are deleted by an entity
      # (such as when events are deleted on snapshot). Set to a duration to expire items after this time
      # following the triggered deletion. Disabled when set to `off` or `none`.
      use-time-to-live-for-deletes = off

      # Set a time-to-live duration on all events when they are originally created and stored.
      # Disabled when set to `off` or `none`.
      event-time-to-live = off

      # Set a time-to-live duration on all snapshots when they are originally created and stored.
      # Disabled when set to `off` or `none`.
      snapshot-time-to-live = off
    }

    # Time-to-live settings per entity type for event sourced entities.
    # See `event-sourced-defaults` for possible settings and default values.
    # Prefix matching is supported by using * at the end of an entity type key.
    event-sourced-entities {
      # Example configuration:
      # "some-entity-type" {
      #   use-time-to-live-for-deletes = 7 days
      # }
      # "entity-type-*" {
      #   event-time-to-live = 3 days
      #   snapshot-time-to-live = 5 days
      # }
    }
  }
}
Found an error in this documentation? The source code for this page can be found here. Please feel free to edit and contribute a pull request.