In this post I will explain how you can build fault-tolerant systems using Scala Actors by arranging them in Supervisor hierarchies using a library for Scala Supervisors that I just released. But first, let's recap what Actors are and what makes them useful. An actor is an abstraction that implements Message-Passing Concurrency. Actors have no shared state and are communicating by sending and receiving messages. This is a paradigm that provides a very different and much simpler concurrency model than Shared-State Concurrency (the scheme adopted by C, Java, C# etc.) and making it easier to avoid problems like deadlocks, live locks, thread starvation etc. This makes it possible to write code that is deterministic and side-effect-free, something that makes easier to write, test, understand and reason about. Each actor has a mailbox in which it receives incoming messages and can use pattern matching on the messages to decide if a message is interesting and which action to take. The most well known and successful implementation of actors can be found in the Erlang language (and the OTP platform) where it has been used to implement extremely fault tolerant (99.9999999% reliability - 9 nines) and massively concurrent systems (with hundreds of thousand simultaneous actors). So what are Supervisor hierarchies? Let's go to the source; http://www.erlang.org/doc/design_principles/sup_princ.html#5.
A supervisor is responsible for starting, stopping and monitoring its child processes. The basic idea of a supervisor is that it should keep its child processes alive by restarting them when necessary.It has two different restart strategies; All-For-One and One-For-One. Best explained using some pictures (referenced from erlang.org): OneForOne AllForOne Naturally, the library I have written for Scala is by no means as complete and hardened as Erlang's, but it seems to do a decent job in providing the core functionality. The implementation consists of two main abstractions;
* The Supervisor manages hierarchies of Scala actors and provides fault-tolerance in terms of different restart semantics. The configuration and semantics is almost a 1-1 port of the Erlang Supervisor implementation, explained in the erlang.org doc referenced above. Read this document in order to understand how to configure the Supervisor properly.
* The GenericServer (which subclasses the Actor class) is a trait that forms the base for a server to be managed by a Supervisor.
The GenericServer is wrapped by a GenericServerContainer instance providing a necessary indirection needed to be able to fully manage the life-cycle of the GenericServer in an easy way.
So, let's try it out by writing a small example in which we create a couple of servers, configure them, use them in various ways, kill one of them, see it recover, hotswap its implementation etc.
(Sidenote: I have written about hotswapping actors before, however this library has taken this approach a but further and provides a more flexible and powerful way of achieving this. Thanks DPP.)
This walk-through will only cover some of the API, for more details look at the code or the tests.
1. Create our server messages
2. Create a
We do that by extending the
GenericServer trait and override the
GenericServer also has some callback life-cycle methods, such as
3. Wrap our
SampleServer in a
Here we also give it a name to be able to refer to it later. We are creating two instances of the same server impl in order to try out multiple server restart in case of failure.
4. Create a
Here we create a
SupervisorFactory that is configuring our servers. The configuration mimics the Erlang configuration and defines a general restart strategy for our
Supervisor as well as a list of workers (servers) which for each we define a specific life-cycle.
5. Create a new
6. Start the
This also starts the servers.
7. Try to communicate with the servers.
Here we try to send a couple one way asynchronous messages to our servers.
Try to get a reference to our sampleServer2 (by name) from the Supervisor before sending a message.
8. Send a message using a future
Try to send an asynchronous message - receive a future - and wait 100 ms (time-out) for the reply.
9. Kill one of the servers
Try to send a message (Die) telling the server to kill itself (by throwing an exception).
10. Send an asyncronous message and wait on a future.
If this call times out, the error handler we define will be invoked - in this case throw an exception. It is likely that this call will time out since the server is in the middle of recovering from failure and we are on purpose defining a very short time-out to trigger this behavior.
The output of this call (due to the async nature of actors) is interleaved with the logging for the restart of the servers. As you can see the log below can be found in the middle of the restart output.
Server should be up again. Try the same call again
Also check that server number 2 is up and healthy.
11. Try to hotswap the server implementation
Here we are passing in a completely new implementation of the server logic (doesn't look that different tough, but it can be any piece of scala pattern matching code) to the server's hotswap method.
12. Try the hotswapped server out
13. Hotswap again
14. Send an asyncronous message that will wait on a future (using a different syntax/method).
Method returns an
Option[T] which can be of two different types;
None. If we receive
Some(result) then we return the result, but if
None is received then we invoke the error handler that we define in the
getOrElse method. In this case print out an info message (but you could throw an exception or do whatever you like...) and return a default value (
Same invocation with pattern matching syntax.
15. Hotswap back to original implementation.
This is done by passing in
None to the hotswap method.
16. Test the final hotswap
17. Shut down the supervisor and its server(s)
You can find this code in the
sample.scala file in the root directory of the distribution. Run it by invoking:
The SCM system used is Git.
1. Download and install Git
git clone [email protected]:jboner/scala-supervisor.git.
The build system used is Maven 2.
1. Download and install Maven 2.
2. Step into the root dir
This will build the project, run all tests, create a jar and upload it to your local Maven repository ready for use.
Automatically downloaded my Maven.
1. Scala 2.7.1-final
2. SLF4J 1.5.2
3. LogBack Classic 0.9.9
That's all to it.