Encoding and decoding binary data
Loading

Encoding and decoding binary data

Note

Previously Akka offered a specialized Iteratee implementation in the akka.actor.IO object which is now deprecated in favor of the pipeline mechanism described here. The documentation for Iteratees can be found here.

Warning

The IO implementation is marked as “experimental” as of its introduction in Akka 2.2.0. We will continue to improve this API based on our users’ feedback, which implies that while we try to keep incompatible changes to a minimum the binary compatibility guarantee for maintenance releases does not apply to the contents of the akka.io package.

Akka adopted and adapted the implementation of data processing pipelines found in the spray-io module. The idea is that encoding and decoding often go hand in hand and keeping the code pertaining to one protocol layer together is deemed more important than writing down the complete read side—say—in the iteratee style in one go; pipelines encourage packaging the stages in a form which lends itself better to reuse in a protocol stack. Another reason for choosing this abstraction is that it is at times necessary to change the behavior of encoding and decoding within a stage based on a message stream’s state, and pipeline stages allow communication between the read and write halves quite naturally.

The actual byte-fiddling can be done within pipeline stages, for example using the rich API of ByteIterator and ByteStringBuilder as shown below. All these activities are synchronous transformations which benefit greatly from CPU affinity to make good use of those data caches. Therefore the design of the pipeline infrastructure is completely synchronous, every stage’s handler code can only directly return the events and/or commands resulting from an input, there are no callbacks. Exceptions thrown within a pipeline stage will abort processing of the whole pipeline under the assumption that recoverable error conditions will be signaled in-band to the next stage instead of raising an exception.

An overall “logical” pipeline can span multiple execution contexts, for example starting with the low-level protocol layers directly within an actor handling the reads and writes to a TCP connection and then being passed to a number of higher-level actors which do the costly application level processing. This is supported by feeding the generated events into a sink which sends them to another actor, and that other actor will then upon reception feed them into its own pipeline.

Introducing the Sample Protocol

In the following the process of implementing a protocol stack using pipelines is demonstrated on the following simple example:

frameLen: Int
persons: Int
persons times {
  first: String
  last: String
}
points: Int
points times Double

mapping to the following data type:

case class Person(first: String, last: String)
case class HappinessCurve(points: IndexedSeq[Double])
case class Message(persons: Seq[Person], stats: HappinessCurve)

We will split the handling of this protocol into two parts: the frame-length encoding handles the buffering necessary on the read side and the actual encoding of the frame contents is done in a separate stage.

Building a Pipeline Stage

As a common example, which is also included in the akka-actor package, let us look at a framing protocol which works by prepending a length field to each message.

/**
 * Pipeline stage for length-field encoded framing. It will prepend a
 * four-byte length header to the message; the header contains the length of
 * the resulting frame including header in big-endian representation.
 *
 * The `maxSize` argument is used to protect the communication channel sanity:
 * larger frames will not be sent (silently dropped) or received (in which case
 * stream decoding would be broken, hence throwing an IllegalArgumentException).
 */
class LengthFieldFrame(maxSize: Int,
                       byteOrder: ByteOrder = ByteOrder.BIG_ENDIAN,
                       headerSize: Int = 4,
                       lengthIncludesHeader: Boolean = true)
  extends SymmetricPipelineStage[PipelineContext, ByteString, ByteString] {

  // range checks omitted ...

  override def apply(ctx: PipelineContext) =
    new SymmetricPipePair[ByteString, ByteString] {
      var buffer = None: Option[ByteString]
      implicit val byteOrder = LengthFieldFrame.this.byteOrder

      /**
       * Extract as many complete frames as possible from the given ByteString
       * and return the remainder together with the extracted frames in reverse
       * order.
       */
      @tailrec
      def extractFrames(bs: ByteString, acc: List[ByteString]) //
      : (Option[ByteString], Seq[ByteString]) = {
        if (bs.isEmpty) {
          (None, acc)
        } else if (bs.length < headerSize) {
          (Some(bs.compact), acc)
        } else {
          val length = bs.iterator.getLongPart(headerSize).toInt
          if (length < 0 || length > maxSize)
            throw new IllegalArgumentException(
              s"received too large frame of size $length (max = $maxSize)")
          val total = if (lengthIncludesHeader) length else length + headerSize
          if (bs.length >= total) {
            extractFrames(bs drop total, bs.slice(headerSize, total) :: acc)
          } else {
            (Some(bs.compact), acc)
          }
        }
      }

      /*
       * This is how commands (writes) are transformed: calculate length
       * including header, write that to a ByteStringBuilder and append the
       * payload data. The result is a single command (i.e. `Right(...)`).
       */
      override def commandPipeline =
        { bs: ByteString 
          val length =
            if (lengthIncludesHeader) bs.length + headerSize else bs.length
          if (length > maxSize) Seq()
          else {
            val bb = ByteString.newBuilder
            bb.putLongPart(length, headerSize)
            bb ++= bs
            ctx.singleCommand(bb.result)
          }
        }

      /*
       * This is how events (reads) are transformed: append the received
       * ByteString to the buffer (if any) and extract the frames from the
       * result. In the end store the new buffer contents and return the
       * list of events (i.e. `Left(...)`).
       */
      override def eventPipeline =
        { bs: ByteString 
          val data = if (buffer.isEmpty) bs else buffer.get ++ bs
          val (nb, frames) = extractFrames(data, Nil)
          buffer = nb
          /*
           * please note the specialized (optimized) facility for emitting
           * just a single event
           */
          frames match {
            case Nil         Nil
            case one :: Nil  ctx.singleEvent(one)
            case many        many reverseMap (Left(_))
          }
        }
    }
}

In the end a pipeline stage is nothing more than a set of three functions: one transforming commands arriving from above, one transforming events arriving from below and the third transforming incoming management commands (not shown here, see below for more information). The result of the transformation can in either case be a sequence of commands flowing downwards or events flowing upwards (or a combination thereof).

In the case above the data type for commands and events are equal as both functions operate only on ByteString, and the transformation does not change that type because it only adds or removes four octets at the front.

The pair of command and event transformation functions is represented by an object of type PipePair, or in this case a SymmetricPipePair. This object could benefit from knowledge about the context it is running in, for example an Actor, and this context is introduced by making a PipelineStage be a factory for producing a PipePair. The factory method is called apply (in good Scala tradition) and receives the context object as its argument. The implementation of this factory method could now make use of the context in whatever way it sees fit, you will see an example further down.

Manipulating ByteStrings

The second stage of our sample protocol stack illustrates in more depth what showed only a little in the pipeline stage built above: constructing and deconstructing byte strings. Let us first take a look at the encoder:

/**
 * This trait is used to formualate a requirement for the pipeline context.
 * In this example it is used to configure the byte order to be used.
 */
trait HasByteOrder extends PipelineContext {
  def byteOrder: java.nio.ByteOrder
}

class MessageStage extends SymmetricPipelineStage[HasByteOrder, Message, ByteString] {

  override def apply(ctx: HasByteOrder) = new SymmetricPipePair[Message, ByteString] {

    implicit val byteOrder = ctx.byteOrder

    /**
     * Append a length-prefixed UTF-8 encoded string to the ByteStringBuilder.
     */
    def putString(builder: ByteStringBuilder, str: String): Unit = {
      val bs = ByteString(str, "UTF-8")
      builder putInt bs.length
      builder ++= bs
    }

    override val commandPipeline = { msg: Message 
      val bs = ByteString.newBuilder

      // first store the persons
      bs putInt msg.persons.size
      msg.persons foreach { p 
        putString(bs, p.first)
        putString(bs, p.last)
      }

      // then store the doubles
      bs putInt msg.stats.points.length
      bs putDoubles (msg.stats.points.toArray)

      // and return the result as a command
      ctx.singleCommand(bs.result)
    }

    // decoding omitted ...
  }
}

Note how the byte order to be used by this stage is fixed in exactly one place, making it impossible get wrong between commands and events; the way how the byte order is passed into the stage demonstrates one possible use for the stage’s context parameter.

The basic tool for constucting a ByteString is a ByteStringBuilder which can be obtained by calling ByteString.newBuilder since byte strings implement the IndexesSeq[Byte] interface of the standard Scala collections. This builder knows a few extra tricks, though, for appending byte representations of the primitive data types like Int and Double or arrays thereof. Encoding a String requires a bit more work because not only the sequence of bytes needs to be encoded but also the length, otherwise the decoding stage would not know where the String terminates. When all values making up the Message have been appended to the builder, we simply pass the resulting ByteString on to the next stage as a command using the optimized singleCommand facility.

Warning

The singleCommand and singleEvent methods provide a way to generate responses which transfer exactly one result from one pipeline stage to the next without suffering the overhead of object allocations. This means that the returned collection object will not work for anything else (you will get ClassCastExceptions!) and this facility can only be used EXACTLY ONCE during the processing of one input (command or event).

Now let us look at the decoder side:

def getString(iter: ByteIterator): String = {
  val length = iter.getInt
  val bytes = new Array[Byte](length)
  iter getBytes bytes
  ByteString(bytes).utf8String
}

override val eventPipeline = { bs: ByteString 
  val iter = bs.iterator

  val personLength = iter.getInt
  val persons =
    (1 to personLength) map (_  Person(getString(iter), getString(iter)))

  val curveLength = iter.getInt
  val curve = new Array[Double](curveLength)
  iter getDoubles curve

  // verify that this was all; could be left out to allow future extensions
  assert(iter.isEmpty)

  ctx.singleEvent(Message(persons, HappinessCurve(curve)))
}

The decoding side does the same things that the encoder does in the same order, it just uses a ByteIterator to retrieve primitive data types or arrays of those from the underlying ByteString. And in the end it hands the assembled Message as an event to the next stage using the optimized singleEvent facility (see warning above).

Building a Pipeline

Given the two pipeline stages introduced in the sections above we can now put them to some use. First we define some message to be encoded:

val msg =
  Message(
    Seq(
      Person("Alice", "Gibbons"),
      Person("Bob", "Sparsely")),
    HappinessCurve(Array(1.0, 3.0, 5.0)))

Then we need to create a pipeline context which satisfies our declared needs:

val ctx = new HasByteOrder {
  def byteOrder = java.nio.ByteOrder.BIG_ENDIAN
}

Building the pipeline and encoding this message then is quite simple:

val stages =
  new MessageStage >>
    new LengthFieldFrame(10000)

// using the extractor for the returned case class here
val PipelinePorts(cmd, evt, mgmt) =
  PipelineFactory.buildFunctionTriple(ctx, stages)

val encoded: (Iterable[Message], Iterable[ByteString]) = cmd(msg)

The tuple returned from buildFunctionTriple contains one function for injecting commands, one for events and a third for injecting management commands (see below). In this case we demonstrate how a single message msg is encoded by passing it into the cmd function. The return value is a pair of sequences, one for the resulting events and the other for the resulting commands. For the sample pipeline this will contain exactly one command—one ByteString. Decoding works in the same way, only with the evt function (which can again also result in commands being generated, although that is not demonstrated in this sample).

Besides the more functional style there is also an explicitly side-effecting one:

val stages =
  new MessageStage >>
    new LengthFieldFrame(10000)

val injector = PipelineFactory.buildWithSinkFunctions(ctx, stages)(
  commandHandler ! _, // will receive messages of type Try[ByteString]
  eventHandler ! _ // will receive messages of type Try[Message]
  )

injector.injectCommand(msg)

The functions passed into the buildWithSinkFunctions factory method describe what shall happen to the commands and events as they fall out of the pipeline. In this case we just send those to some actors, since that is usually quite a good strategy for distributing the work represented by the messages.

The types of commands or events fed into the provided sink functions are wrapped within Try so that failures can also be encoded and acted upon. This means that injecting into a pipeline using a PipelineInjector will catch exceptions resulting from processing the input, in which case the exception (there can only be one per injection) is passed into the respective sink.

Using the Pipeline’s Context

Up to this point there was always a parameter ctx which was used when constructing a pipeline, but it was not explained in full. The context is a piece of information which is made available to all stages of a pipeline. The context may also carry behavior, provide infrastructure or helper methods etc. It should be noted that the context is bound to the pipeline and as such must not be accessed concurrently from different threads unless care is taken to properly synchronize such access. Since the context will in many cases be provided by an actor it is not recommended to share this context with code executing outside of the actor’s message handling.

Warning

A PipelineContext instance MUST NOT be used by two different pipelines since it contains mutable fields which are used during message processing.

Using Management Commands

Since pipeline stages do not have any reference to the pipeline or even to their neighbors they cannot directly effect the injection of commands or events outside of their normal processing. But sometimes things need to happen driven by a timer, for example. In this case the timer would need to cause sending tick messages to the whole pipeline, and those stages which wanted to receive them would act upon those. In order to keep the type signatures for events and commands useful, such external triggers are sent out-of-band, via a different channel—the management port. One example which makes use of this facility is the TickGenerator which comes included with akka-actor:

/**
 * This trait expresses that the pipeline’s context needs to live within an
 * actor and provide its ActorContext.
 */
trait HasActorContext extends PipelineContext {
  /**
   * Retrieve the [[akka.actor.ActorContext]] for this pipeline’s context.
   */
  def getContext: ActorContext
}

object TickGenerator {
  /**
   * This message type is used by the TickGenerator to trigger
   * the rescheduling of the next Tick. The actor hosting the pipeline
   * which includes a TickGenerator must arrange for messages of this
   * type to be injected into the management port of the pipeline.
   */
  trait Trigger

  /**
   * This message type is emitted by the TickGenerator to the whole
   * pipeline, informing all stages about the time at which this Tick
   * was emitted (relative to some arbitrary epoch).
   */
  case class Tick(@BeanProperty timestamp: FiniteDuration) extends Trigger
}

/**
 * This pipeline stage does not alter the events or commands
 */
class TickGenerator[Cmd <: AnyRef, Evt <: AnyRef](interval: FiniteDuration)
  extends PipelineStage[HasActorContext, Cmd, Cmd, Evt, Evt] {
  import TickGenerator._

  override def apply(ctx: HasActorContext) =
    new PipePair[Cmd, Cmd, Evt, Evt] {

      // use unique object to avoid double-activation on actor restart
      private val trigger: Trigger = {
        val path = ctx.getContext.self.path

        new Trigger {
          override def toString = s"Tick[$path]"
        }
      }

      private def schedule() =
        ctx.getContext.system.scheduler.scheduleOnce(
          interval, ctx.getContext.self, trigger)(ctx.getContext.dispatcher)

      // automatically activate this generator
      schedule()

      override val commandPipeline = (cmd: Cmd)  ctx.singleCommand(cmd)

      override val eventPipeline = (evt: Evt)  ctx.singleEvent(evt)

      override val managementPort: Mgmt = {
        case `trigger` 
          ctx.getContext.self ! Tick(Deadline.now.time)
          schedule()
          Nil
      }
    }
}

This pipeline stage is to be used within an actor, and it will make use of this context in order to schedule the delivery of TickGenerator.Trigger messages; the actor is then supposed to feed these messages into the management port of the pipeline. An example could look like this:

class Processor(cmds: ActorRef, evts: ActorRef) extends Actor {

  val ctx = new HasActorContext with HasByteOrder {
    def getContext = Processor.this.context
    def byteOrder = java.nio.ByteOrder.BIG_ENDIAN
  }

  val pipeline = PipelineFactory.buildWithSinkFunctions(ctx,
    new TickGenerator(1000.millis) >>
      new MessageStage >>
      new LengthFieldFrame(10000) //
      )(
      // failure in the pipeline will fail this actor
      cmd  cmds ! cmd.get,
      evt  evts ! evt.get)

  def receive = {
    case m: Message                pipeline.injectCommand(m)
    case b: ByteString             pipeline.injectEvent(b)
    case t: TickGenerator.Trigger  pipeline.managementCommand(t)
  }
}

This actor extends our well-known pipeline with the tick generator and attaches the outputs to functions which send commands and events to actors for further processing. The pipeline stages will then all receive one Tick per second which can be used like so:

var lastTick = Duration.Zero

override val managementPort: Mgmt = {
  case TickGenerator.Tick(timestamp) 
    // omitted ...
    println(s"time since last tick: ${timestamp - lastTick}")
    lastTick = timestamp
    Nil
}

Note

Management commands are delivered to all stages of a pipeline “effectively parallel”, like on a broadcast medium. No code will actually run concurrently since a pipeline is strictly single-threaded, but the order in which these commands are processed is not specified.

The intended purpose of management commands is for each stage to define its special command types and then listen only to those (where the aforementioned Tick message is a useful counter-example), exactly like sending packets on a wifi network where every station receives all traffic but reacts only to those messages which are destined for it.

If you need all stages to react upon something in their defined order, then this must be modeled either as a command or event, i.e. it will be part of the “business” type of the pipeline.

Contents