Sanitization

Overview

When services process user-generated content, protecting personally identifiable information (PII) is both a legal requirement and a trust imperative. Regulations like GDPR, CCPA, and HIPAA mandate careful handling of personal data, while users expect their private information won’t be exposed to support staff, analysts, or third parties unnecessarily.

Text anonymization—detecting and masking sensitive details like names, emails, and phone numbers—enables legitimate use cases such as logging, analytics, and model training while minimizing privacy risks. It reduces the attack surface in case of breaches and demonstrates a privacy-respecting approach to data handling.

Akka supports this through service-wide sanitization.

The sanitization disabled by default and can be selectively enabled through configuration.

When enabled, sanitization is automatically applied to text that:

written to logs
passed to agent models from agent requests
passed to agent models from local tool or MCP tool output

Text matched by a sanitizer is replaced by a mask of * containing the same number of characters as the original matched string.

For example, with a credit card sanitizer enabled, the following text:

I'm having problems using my credit card 5204 46025 0000 006

Will be masked to:

I'm having problems using my credit card *******************

Before being written in logs or passed to agent models.

Ad hoc sanitization

Sanitization can also be programmatically applied to text in any component where it makes sense for a specific business case, for example before sending some text to a third party API or before writing a text in the state of an entity. This is done by injecting a akka.javasdk.Sanitizer in the component constructor and then using akka.javasdk.Sanitizer#sanitize on the text.

Sanitizer types

There are two types of sanitizers available, it is possible combine predefined and custom sanitizers in the same service:

Predefined

A small set of common sanitizers is built into the Akka runtime and are enabled by name in config:

Name Description

Name	Description
`EMAIL`	email addresses
`PHONE`	International and national phone numbers
`CREDIT_CARD`	VISA, Mastercard, American Express, Diners, Discover, JCB, and generic credit card numbers
`IBAN`	international bank account numbers
`IP_ADDRESS`	ipv4 and ipv6 network addresses

EMAIL

email addresses

PHONE

International and national phone numbers

CREDIT_CARD

VISA, Mastercard, American Express, Diners, Discover, JCB, and generic credit card numbers

IBAN

international bank account numbers

IP_ADDRESS

ipv4 and ipv6 network addresses

One or more of these are enabled in the service application.conf file like this:

akka.javasdk.sanitization {
  predefined-sanitizers = ["IBAN", "CREDIT_CARD"]
}

Custom

In many cases more application and business domain specific sanitizers are useful. Custom sanitizers allows defining regular expressions that define character sequences that should be masked.

Custom, application specific sanitizers can be defined by adding a config block akka.javasdk.sanitization.regex-sanitizers with a name for each custom sanitizer followed by a config block with a single pattern key that has a value that is a valid Java regular expression that matches the type of text that should be masked.

This example masks an hypothetical customer id in the form S0123456789:

akka.javasdk.sanitization.regex-sanitizers = {
  "CUSTOMER_IDS" = { pattern = "S\\d{10}" }
}

This would lead to texts like:

Customer S0847362951 reported an issue with their order

Being masked to:

Customer *********** reported an issue with their order

Before being written in logs or passed to agent models.

Performance considerations

Sanitization is applied to every log entry. In high-throughput applications, numerous sanitization rules or complex regular expressions may impact performance. Consider monitoring application performance and optimizing regex patterns if necessary.

Testing sanitization

In tests the sanitizer can be directly accessed from getSanitizer method in TestKit or TestKitSupport to assert that expected texts are masked given the service sanitizer configuration.