Activate 1.0 was launched, people were talking about it and for my pleasant surprise Jonas Bonér, the Typesafe CTO, posted about it on twitter:

@jboner Activate Framework – durable STM with pluggable persistence: http://activate-framework.org/ #scala

Really nice! But then there is another post:

@jboner For the record: I don’t believe in durable STMs. We tried that in Akka a couple of years ago. Dropped it for a reason. Use Slick instead.

While I fully respect Jonas Bonér and the Akka team, I disagree with this. I had already read about the reasons why the Akka Persistence module was discontinued, and it does not look like an STM problem, but rather design pitfalls in this specific Akka module. This is the document produced by the Akka Team:

https://docs.google.com/document/pub?id=1c9McZsW_mXiiQRWD-trViWmllHUb0ujOrJ92f-AUWYo

The analysis focuses on two main points:

1. It is not possible to guarantee the Durable STM consistency due the absence of ACID transactions on NoSQL databases. (“No failure atomicity” and “No consistency”)

2. The Akka Persistence STM is not distributed, so it is not possible to use it with multiple virtual machines. (“No isolation” and “Lost updates”)

Autopsy

The second point is about an absent Akka Persistence feature, so it is not possible to do a study on it. Activate addresses this issue by providing a Coordinator to make the Durable STM distributed. More information here and here.

The first point is about problems people were running into when using Akka Persistence. So it is possible to do a study on what went wrong. To start, we should locate the old source code. The module was moved to the also discontinued akka-modules repository. This is the source code (v1.0 tag):

https://github.com/akka/akka-modules/tree/82e7dea3dd93f976a519a5615b95d52fd8e6c28b

Binaries:

http://repo.akka.io/releases/se/scalablesolutions/akka/akka-persistence-common/1.0/

Since Activate achieves a high level of consistency even when used with MongoDB (non ACID), perhaps Akka Persistence was doing something different than Activate that could produce inconsistencies.

To find this difference, we can implement a very simple atomic storage:

SimpleStorage.scala

This storage supports only refs and uses a synchronized map of atomic integers to store them. Placement and retrieval of items in this storage are fully atomic. Since Akka Persistence’s problem was due to the lack of atomicity at NoSQL databases, this atomic storage should have no problems in the following test case.

Test case

This test creates a ref with initial value 0 and run 50 threads in parallel, each one incrementing the value by 1 with a STM transaction. The ref’s final value should be 50:

AkkaPersistenceConsistencySpecs.scala

Values are verified to be not equal to the expected value (50). The console output:


expectedCurrentValue=50
akkaCurrentValue=16
databaseCurrentValue=16
multiverseCurrentValue=0

The akkaCurrentValue and databaseCurrentValue are the same, but differ from the expected value. It varies on each test execution. The multiverseCurrentValue is always zero.

Why does Akka Persistence produce inconsistencies, even using an atomic storage?

The commit flow problem

Akka Persistence uses the Multiverse STM. It listens to the transaction events:

Transaction.scala


    mtx.registerLifecycleListener(new TransactionLifecycleListener() {
      def notify(mtx: MultiverseTransaction, event: TransactionLifecycleEvent) = event match {
        case TransactionLifecycleEvent.PostCommit => tx.commitJta
        case TransactionLifecycleEvent.PreCommit => tx.commitPersistentState
        case TransactionLifecycleEvent.PostAbort => tx.abort
        case _ => {}
      }
    })

The persistent commit occurs when a PreCommit event is fired by Multiverse. Looking at its implementation we can see where this event is fired:

AbstractTransaction.java


public final void commit() {
  ...
          case Active:
               prepare();
  ...

public final void prepare() {
  ...
  case Active:
               try {
                   notifyAll(TransactionLifecycleEvent.PreCommit);
  ...

Before these lines there are only console log actions, so we conclude the PreCommit event is fired as the first thing on a transaction commit. This is the basic flow of a STM transaction commit:

1. Lock each transactional unit (refs) used by the transaction;
2. Validate all reads and writes, throw an exception to retry the transaction in case of inconsistency;
3. Commit the write operations values on the transactional units (refs) in memory;
4. Release locks.

As the PreCommit event occurs before this workflow, the Akka Persistence was storing values that could be invalid. If the transaction retries, the invalid value is already placed on the storage. This looks like a design error. It would be more reasonable to do it:

- After validating the transaction (2), as the storage should receive only valid items.
- Before releasing the locks (4). Otherwise, it is not possible to ensure that placement of items in the storage occurs in the same order as STM transaction commits.

Activate uses a different commit flow in order to solve this problem:

1. Lock each transactional unit (refs) used by the transaction;
2. Validate all reads and writes, throw an exception to retry the transaction if there is inconsistency;
3. Place items in the storage;
4. Commit the write operations values on the transactional units (refs) in memory;
5. Release locks.

The direct access problem

It is possible to see a Durable STM as a way to represent in memory the data that is in the storage. Each data representation can be called as a STM transactional unit. The data representation must obey two important restrictions:

1. Each data representation must be unique in-memory. If there is more than one transactional unit representing the same data, the STM logic to validate transactions can not detect a write/read conflict on the data.

2. All access to the data must be done through a transactional unit. If there is a direct read or write to the storage, again the STM validation logic can not detect reads and writes conflicts.

The first restriction was broken in the Akka Persistence 1.0-M1. The problem was solved by the StorageManager class in 1.0.

The second restriction remains broken. The Persistent* classes access the storage directly when reading values. The read is done through the transaction unit only after a write. The PersistentRef for example:

Storage.scala


def get: Option[T] = if (ref.isDefined) ref.opt else storage.getRefStorageFor(uuid)

Ref is defined only after a write. All reads before the write are not tracked by the STM concurrency control. This is another Akka Persistence design problem.

Activate obeys these two restrictions. It uses soft references to guarantee that there is only one transactional unit for each storage data. It also guarantees that all reads and writes are made through a transactional unit. The database is only read in order to initialize the transactional unit (entity lazy initialization), or whenever the entity is reloaded by the Coordinator in case it’s been modified at another virtual machine.

Conclusion

The Akka Persistence inconsistency problems would be present even with full ACID storages. The same concurrency test that fails using Akka Persistence with the implemented atomic storage, passes using Activate with the non atomic storage MongoDB:

ActivateConsistencySpecs.scala

Activate adds a very high consistency level on top of Mongo, but eventually (particularly when using database eventual consistency with the coordinator) there should be inconsistencies like stale reads. It is an expected limitation of any non ACID database, Activate just minimizes it.

If your application can not have rare inconsistencies, just use Activate with ACID storages so you can have full consistency support. You can choose which modules assemble in your application. The available modules for 1.1 are Jdbc, Mongo and Prevayler. In 1.2 we will have a module to support graph databases too.

This autopsy evidences that Akka Persistence has two problems that make its usage unfeasible with atomic or non atomic storages. Even if the arguments presented in the Akka Team document were completely right, they should be a reason to avoid Durable STM with NoSQL databases, not to discredit Durable STMs at all.