Guardrails
Guardrails can protect against harmful inputs, such as jailbreak attempts, and damaging output, such as mentions of a competitor’s product.
| For protecting sensitive information like PII, see Sanitization. | 
A specific guardrail implements the TextGuardrail interface. It takes the input or output text as a parameter and a result if it passed the validation or not, including an explanation of why the decision was made. These results are included in metrics and traces. A guardrail can abort the interaction with the model, or only report the problem and continue anyway.
An example of a Guardrail implementation:
import akka.javasdk.agent.GuardrailContext;
import akka.javasdk.agent.TextGuardrail;
public class ToxicGuard implements TextGuardrail {
  private final String searchFor;
  public ToxicGuard(GuardrailContext context) {
    searchFor = context.config().getString("search-for");
  }
  @Override
  public Result evaluate(String text) {
    // this would typically be more advanced in a real implementation
    if (text.contains(searchFor)) {
      return new Result(false, "Toxic response '%s' not allowed.".formatted(searchFor));
    } else {
      return Result.OK;
    }
  }
}
Guardrails are enabled by configuration, to be able to enforce at deployment time that certain guardrails are always used.
akka.javasdk.agent.guardrails {
  "pii guard" {                                     (1)
    class = "com.example.guardrail.PiiGuard"        (2)
    agents = ["planner-agent"]                      (3)
    agent-roles = ["worker"]                        (4)
    category = PII                                  (5)
    use-for = ["model-request", "mcp-tool-request"] (6)
    report-only = false                             (7)
  }
  "toxic guard" {
    class = "com.example.guardrail.ToxicGuard"
    agent-roles = ["worker"]
    category = TOXIC
    use-for = ["model-response", "mcp-tool-response"]
    report-only = false
    search-for = "bad stuff"
  }
}
| 1 | Each configured guardrail has a unique name. | 
| 2 | Implementation class of the guardrail. | 
| 3 | Enable this guardrail for agents with these component ids. | 
| 4 | Enable this guardrail for agents with these roles. | 
| 5 | The type of validation, such as PII and TOXIC. | 
| 6 | Where to use the guardrail, such as for the model request or model response. | 
| 7 | If it didn’t pass the evaluation criteria, the execution can either be aborted or continue anyway. In both cases, the result is tracked in logs, metrics and traces. | 
The implementation class of the guardrail is configured with the class property. The class must implement the TextGuardrail interface. The class may optionally have a constructor with a GuardrailContext parameter, which includes the name and the config section for the specific guardrail. In above code example of the ToxicGuard you can see how the configuration property search-for is read from the configuration of the GuardrailContext parameter.
Agents are selected by matching agent or agent-role configuration.
- 
agents: enabled for agents with these component ids, ifagentscontain"*"the guardrail is enabled for all agents - 
agent-roles: enabled for agents with these roles, if agent-roles contain"*"the guardrail is enabled for all agents that has a role, but not for agents without a role 
If both agents and agent-roles are defined it’s enough that one of them matches to enable the guardrail for an agent.
This role is defined in the @AgentRole annotation.
The name and the category are reported in logs, metrics and traces. The category should classify the type of validation. It can be any value, but a few recommended categories are JAILBREAK, PROMPT_INJECTION, PII, TOXIC, HALLUCINATED, NSFW, FORMAT.
The guardrail can be enabled for certain inputs or outputs with the use-for property. The use-for property accepts the following values: model-request, model-response, mcp-tool-request, mcp-tool-response, and *.
Guardrail of similar text
The built-in SimilarityGuard evaluates the text by making a similarity search in a dataset of "bad examples". If the similarity exceeds a threshold, the result is flagged as blocked.
This is how to configure the SimilarityGuard:
akka.javasdk.agent.guardrails {
  "jailbreak guard" {
    class = "akka.javasdk.agent.SimilarityGuard"
    agents = ["planner-agent", "weather-agent"]
    category = JAILBREAK
    use-for = ["model-request"]
    threshold = 0.75
    bad-examples-resource-dir = "guardrail/jailbreak"
  }
}
Here, it’s using predefined examples of jailbreak prompts in guardrail/jailbreak. Those have been incorporated from https://github.com/verazuo/jailbreak_llms, but you can define your own examples and place in a subdirectory of  src/main/resources/. All text files in the configured bad-examples-resource-dir are included in the similarity search.
This can be used for other things than jailbreak attempt detection.