Subject identifiers

Note

The following is not legal advice, but simply general suggestions and ideas. The GDPR field is still in flux, and only with time will accepted patterns and common use cases emerge.

As introduced in the overview, dealing with data subject identifiers is an important part of a GDPR strategy. There are multiple ways to implement identifiers correctly, depending on your application:

  • If your application already has some userId you may want to use this identifier for the data subject id. However, you should take care that such id does not itself carry personal information. For example, if the id includes the “nickname” of an user, you should not use it as a data subject id: the ID itself can be seen as personal data. If you would like to use user ids as your data subject ids you may want to consider using a SHA-1 of the user id and some additional seed, as illustrated in the example below.

  • A good alternative is to generate UUIDs for data subjects. You can do so using the java.util.UUID class and obtain its String representation to be used in the WithDataSubjectId wrapper provided by akka-gdpr, as described in @ref[].

Also, check whether “a user” has more than one data subject id. For example, different systems may have assigned the same user different ids. When a request to remove data for a given user is issued to your application, it may need to deal with all of the user’s data subject ids.

Another point of discussion is whether metadata associated with a particular data subject id should be removed or not. Even when using data shredding, it is possible that information about when events were stored linked together with correlated data could be used to deduce some information about “that specific” data subject. At this point no rulings have established how far one should go with regards to sanitizing such metadata.

Example

The following example illustrates using SHA-1 to encrypt a user id:

Scala
import java.security.MessageDigest

import akka.Done
import akka.stream.scaladsl.Sink


// only share the instance when using parallelism = 1
private val sha1 = MessageDigest.getInstance("SHA-1")

/**
 * Implement your logic for determining a stable data subject id for each event here.
 *
 * For example, it could be based on masking a known user identifier that exists in all
 * events related to a given user. Or it could be *based on* the persistenceId of the event passed in,
 * which is a simple and effective solution.
 */
private def determineEncryptionKey(event: JournaledEvent): Option[String] = {
  if (event.persistenceId startsWith "user") {
    try {
      sha1.update("my-app-secrets".getBytes)
      sha1.update(event.persistenceId.getBytes)

      Some(new String(sha1.digest()))
    } finally {
      sha1.reset()
    }
  } else {
    None
  }
}
Java
import java.security.MessageDigest;

final MessageDigest SHA1; // only share the instance when using parallelism = 1

/**
 * Implement your logic for determining a stable data subject id for each event here.
 *
 * For example, it could be based on masking a known user identifier that exists in all
 * events related to a given user. Or it could be *based on* the persistenceId of the event passed in,
 * which is a simple and effective solution.
 */
private Optional<String> determineEncryptionKey(JournaledEvent event) {
  if (event.persistenceId().startsWith("user")) {
    try {
      SHA1.update("my-app-secrets".getBytes());
      SHA1.update(event.persistenceId().getBytes());
      return Optional.of(new String(SHA1.digest()));
    } catch (Exception ex) {
      return  Optional.empty();
    } finally {
      SHA1.reset();
    }
  } else {
    return Optional.empty();
  }
}