Fault Tolerance

When an actor throws an unexpected exception, a failure, while processing a message or during initialization, the actor will by default be stopped.

Note

An important difference between Typed and Untyped actors is that Typed actors are by default stopped if an exception is thrown and no supervision strategy is defined while in Untyped they are restarted.

Note that there is an important distinction between failures and validation errors:

A validation error means that the data of a command sent to an actor is not valid, this should rather be modelled as a part of the actor protocol than make the actor throw exceptions.

A failure is instead something unexpected or outside the control of the actor itself, for example a database connection that broke. Opposite to validation errors, it is seldom useful to model such as parts of the protocol as a sending actor very seldom can do anything useful about it.

For failures it is useful to apply the “let it crash” philosophy: instead of mixing fine grained recovery and correction of internal state that may have become partially invalid because of the failure with the business logic we move that responsibility somewhere else. For many cases the resolution can then be to “crash” the actor, and start a new one, with a fresh state that we know is valid.

Supervision

In Akka Typed this “somewhere else” is called supervision. Supervision allows you to declaratively describe what should happen when a certain type of exceptions are thrown inside an actor. To use supervision the actual Actor behavior is wrapped using Behaviors.supervise, for example to restart on IllegalStateExceptions:

Scala

sourceBehaviors.supervise(behavior).onFailure[IllegalStateException](SupervisorStrategy.restart)

Java

sourceBehaviors.supervise(behavior)
    .onFailure(IllegalStateException.class, SupervisorStrategy.restart());

Or to resume, ignore the failure and process the next message, instead:

Scala

sourceBehaviors.supervise(behavior).onFailure[IllegalStateException](SupervisorStrategy.resume)

Java

sourceBehaviors.supervise(behavior)
    .onFailure(IllegalStateException.class, SupervisorStrategy.resume());

More complicated restart strategies can be used e.g. to restart no more than 10 times in a 10 second period:

Scala

sourceBehaviors
  .supervise(behavior)
  .onFailure[IllegalStateException](
    SupervisorStrategy.restart.withLimit(maxNrOfRetries = 10, withinTimeRange = 10.seconds))

Java

sourceBehaviors.supervise(behavior)
    .onFailure(
        IllegalStateException.class,
        SupervisorStrategy.restart().withLimit(10, FiniteDuration.apply(10, TimeUnit.SECONDS)));

To handle different exceptions with different strategies calls to supervise can be nested:

Scala

sourceBehaviors
  .supervise(Behaviors.supervise(behavior).onFailure[IllegalStateException](SupervisorStrategy.restart))
  .onFailure[IllegalArgumentException](SupervisorStrategy.stop)

Java

sourceBehaviors.supervise(
        Behaviors.supervise(behavior)
            .onFailure(IllegalStateException.class, SupervisorStrategy.restart()))
    .onFailure(IllegalArgumentException.class, SupervisorStrategy.stop());

For a full list of strategies see the public methods on SupervisorStrategy

Wrapping behaviors

It is very common to store state by changing behavior e.g.

Scala

sourcesealed trait Command
case class Increment(nr: Int) extends Command
case class GetCount(replyTo: ActorRef[Int]) extends Command

def counter(count: Int): Behavior[Command] = Behaviors.receiveMessage[Command] {
  case Increment(nr: Int) =>
    counter(count + nr)
  case GetCount(replyTo) =>
    replyTo ! count
    Behaviors.same
}

Java

sourceinterface CounterMessage {}

public static final class Increase implements CounterMessage {}

public static final class Get implements CounterMessage {
  final ActorRef<Got> sender;

  public Get(ActorRef<Got> sender) {
    this.sender = sender;
  }
}

public static final class Got {
  final int n;

  public Got(int n) {
    this.n = n;
  }
}

public static Behavior<CounterMessage> counter(int currentValue) {
  return Behaviors.receive(CounterMessage.class)
      .onMessage(
          Increase.class,
          (context, o) -> {
            return counter(currentValue + 1);
          })
      .onMessage(
          Get.class,
          (context, o) -> {
            o.sender.tell(new Got(currentValue));
            return Behaviors.same();
          })
      .build();
}

When doing this supervision only needs to be added to the top level:

Scala

sourceBehaviors.supervise(counter(1))

Java

sourceBehaviors.supervise(counter(1));

Each returned behavior will be re-wrapped automatically with the supervisor.

Child actors are stopped when parent is restarting

Child actors are often started in a setup block that is run again when the parent actor is restarted. The child actors are stopped to avoid resource leaks of creating new child actors each time the parent is restarted.

Scala

sourcedef child(size: Long): Behavior[String] =
  Behaviors.receiveMessage(msg => child(size + msg.length))

def parent: Behavior[String] = {
  Behaviors
    .supervise[String] {
      Behaviors.setup { ctx =>
        val child1 = ctx.spawn(child(0), "child1")
        val child2 = ctx.spawn(child(0), "child2")

        Behaviors.receiveMessage[String] { msg =>
          // there might be bugs here...
          val parts = msg.split(" ")
          child1 ! parts(0)
          child2 ! parts(1)
          Behaviors.same
        }
      }
    }
    .onFailure(SupervisorStrategy.restart)
}

Java

sourcestatic Behavior<String> child(long size) {
  return Behaviors.receiveMessage(msg -> child(size + msg.length()));
}

static Behavior<String> parent() {
  return Behaviors.<String>supervise(
          Behaviors.setup(
              ctx -> {
                final ActorRef<String> child1 = ctx.spawn(child(0), "child1");
                final ActorRef<String> child2 = ctx.spawn(child(0), "child2");

                return Behaviors.receiveMessage(
                    msg -> {
                      // there might be bugs here...
                      String[] parts = msg.split(" ");
                      child1.tell(parts[0]);
                      child2.tell(parts[1]);
                      return Behaviors.same();
                    });
              }))
      .onFailure(SupervisorStrategy.restart());
}

It is possible to override this so that child actors are not influenced when the parent actor is restarted. The restarted parent instance will then have the same children as before the failure.

If child actors are created from setup like in the previous example and they should remain intact (not stopped) when parent is restarted the supervise should be placed inside the setup and using SupervisorStrategy.restart.withStopChildren(false)SupervisorStrategy.restart().withStopChildren(false) like this:

Scala

sourcedef parent2: Behavior[String] = {
  Behaviors.setup { ctx =>
    val child1 = ctx.spawn(child(0), "child1")
    val child2 = ctx.spawn(child(0), "child2")

    // supervision strategy inside the setup to not recreate children on restart
    Behaviors
      .supervise {
        Behaviors.receiveMessage[String] { msg =>
          // there might be bugs here...
          val parts = msg.split(" ")
          child1 ! parts(0)
          child2 ! parts(1)
          Behaviors.same
        }
      }
      .onFailure(SupervisorStrategy.restart.withStopChildren(false))
  }
}

Java

sourcestatic Behavior<String> parent2() {
  return Behaviors.setup(
      ctx -> {
        final ActorRef<String> child1 = ctx.spawn(child(0), "child1");
        final ActorRef<String> child2 = ctx.spawn(child(0), "child2");

        // supervision strategy inside the setup to not recreate children on restart
        return Behaviors.<String>supervise(
                Behaviors.receiveMessage(
                    msg -> {
                      // there might be bugs here...
                      String[] parts = msg.split(" ");
                      child1.tell(parts[0]);
                      child2.tell(parts[1]);
                      return Behaviors.same();
                    }))
            .onFailure(SupervisorStrategy.restart().withStopChildren(false));
      });
}

That means that the setup block will only be run when the parent actor is first started, and not when it is restarted.

Bubble failures up through the hierarchy

In some scenarios it may be useful to push the decision about what to do on a failure upwards in the Actor hierarchy and let the parent actor handle what should happen on failures (in untyped Akka Actors this is how it works by default).

For a parent to be notified when a child is terminated it has to watch the child. If the child was stopped because of a failure the ChildFailed signal will be received which will contain the cause. ChildFailed extends Terminated so if your use case does not need to distinguish between stopping and failing you can handle both cases with the Terminated signal.

If the parent in turn does not handle the Terminated message it will itself fail with an akka.actor.typed.DeathPactException.

This means that a hierarchy of actors can have a child failure bubble up making each actor on the way stop but informing the top-most parent that there was a failure and how to deal with it, however, the original exception that caused the failure will only be available to the immediate parent out of the box (this is most often a good thing, not leaking implementation details).

There might be cases when you want the original exception to bubble up the hierarchy, this can be done by handling the Terminated signal, and rethrowing the exception in each actor.

Scala

sourcesealed trait Message
case class Fail(text: String) extends Message

val worker = Behaviors.receive[Message] { (context, message) =>
  message match {
    case Fail(text) => throw new RuntimeException(text)
  }
}

val middleManagementBehavior = Behaviors.setup[Message] { context =>
  context.log.info("Middle management starting up")
  val child = context.spawn(worker, "child")
  context.watch(child)

  // here we don't handle Terminated at all which means that
  // when the child fails or stops gracefully this actor will
  // fail with a DeathWatchException
  Behaviors.receive[Message] { (context, message) =>
    child ! message
    Behaviors.same
  }
}

val bossBehavior = Behaviors
  .supervise(Behaviors.setup[Message] { context =>
    context.log.info("Boss starting up")
    val middleManagement = context.spawn(middleManagementBehavior, "middle-management")
    context.watch(middleManagement)

    // here we don't handle Terminated at all which means that
    // when middle management fails with a DeathWatchException
    // this actor will also fail
    Behaviors.receiveMessage[Message] { message =>
      middleManagement ! message
      Behaviors.same
    }
  })
  .onFailure[DeathPactException](SupervisorStrategy.restart)

// (spawn comes from the testkit)
val boss = spawn(bossBehavior, "upper-management")
boss ! Fail("ping")

Java

sourcepublic class BubblingSample {
  interface Message {}

  public static class Fail implements Message {
    public final String text;

    public Fail(String text) {
      this.text = text;
    }
  }

  public static Behavior<Message> failingChildBehavior =
      Behaviors.receive(Message.class)
          .onMessage(
              Fail.class,
              (context, message) -> {
                throw new RuntimeException(message.text);
              })
          .build();

  public static Behavior<Message> middleManagementBehavior =
      Behaviors.setup(
          (context) -> {
            context.getLog().info("Middle management starting up");
            final ActorRef<Message> child = context.spawn(failingChildBehavior, "child");
            // we want to know when the child terminates, but since we do not handle
            // the Terminated signal, we will in turn fail on child termination
            context.watch(child);

            // here we don't handle Terminated at all which means that
            // when the child fails or stops gracefully this actor will
            // fail with a DeathWatchException
            return Behaviors.receive(Message.class)
                .onMessage(
                    Message.class,
                    (innerCtx, message) -> {
                      // just pass messages on to the child
                      child.tell(message);
                      return Behaviors.same();
                    })
                .build();
          });

  public static Behavior<Message> bossBehavior =
      Behaviors.setup(
          (context) -> {
            context.getLog().info("Boss starting up");
            final ActorRef<Message> middleManagement =
                context.spawn(middleManagementBehavior, "middle-management");
            context.watch(middleManagement);

            // here we don't handle Terminated at all which means that
            // when middle management fails with a DeathWatchException
            // this actor will also fail
            return Behaviors.receive(Message.class)
                .onMessage(
                    Message.class,
                    (innerCtx, message) -> {
                      // just pass messages on to the child
                      middleManagement.tell(message);
                      return Behaviors.same();
                    })
                .build();
          });

  public static void main(String[] args) {
    final ActorSystem<Message> system = ActorSystem.create(bossBehavior, "boss");

    system.tell(new Fail("boom"));
    // this will now bubble up all the way to the boss and as that is the user guardian it means
    // the entire actor system will stop
  }
}