Play Framework: async I/O without the thread pool and callback hell
March 27, 2013
Under the hood, LinkedIn consists of hundreds of services that can be evolved and scaled independently. That is, all functionality is broken down into separate codebases, deployed on separate hardware, and exposed via well-defined APIs. For example, we may have separate front-end services (e.g. Profile, Skills) that talk to separate back-end services (e.g. profile-backend, skills-backend), which in turn talk to separate data services (e.g. Voldemort or Kafka).
In this architecture, our services spend most of their time calling other services and waiting on I/O. Most server technologies rely on a thread pool to handle I/O. Unfortunately, in high traffic environments, thread pools can be hard to manage: big thread pools have too much overhead, but small thread pools may run out of threads if there is a spike in latency or traffic. Either case can cause the service's latency to increase, which cascades to the other services that depend on it; now their thread pools are inadequate, and the problem spreads further. This is thread pool hell.
Recently, we started using the Play Framework to build our services. Play is built on top of Akka and Netty, so it’s fully asynchronous: that is, you can use non-blocking I/O when making calls to remote services. This makes it possible to support many concurrent connections without dealing with thread pools. It also makes it easy to make I/O calls in parallel to improve performance and enables the use of real-time, streaming, and server push technologies such as WebSockets.
In this post, I’ll go over some of the basics of asynchronous programming, the tools that are available in Play 2.1 and Scala, and the patterns that make it easy to manage non-blocking code.
Threaded vs. Evented Primer
We’ll start with a quick discussion of how different web frameworks handle concurrent requests, I/O, and threads. Bear in mind that this is just a high level overview that glosses over some details; as with any discussion of performance, results vary depending on the use case, hardware, OS, and programming language.
There are two general approaches:
Most developers are used to working with Threaded servers: they dedicate one thread to each request, and that thread does all the processing for the request until a response is sent. Any I/O, such as a call to a remote service, is typically synchronous, which means the thread will become blocked and sit around idle until the I/O completes.
Here's an example of some blocking Scala code:
The code stops execution (blocks) at any line of code that is executing I/O and only executes the next line when the I/O is done, at which point the return value contains the data you requested. The code is executed (and read) from top to bottom.
By contrast, Evented servers typically use only a single thread per CPU core. The idea is to ensure that these scarce threads are never blocked: all I/O is asynchronous, so instead of waiting, the thread can process other requests, and only come back to the current one when the response from the I/O call is ready.
An I/O request fires, but the code continues on to the next line before the I/O actually completes. You typically provide some mechanism, such as a callback, to tell the framework to "execute this code later", when the data you requested is available. The flow of the code can no longer be read top to bottom, since a callback in the middle of the code may fire well after all the lines below it have already executed.
Threaded vs. Evented performance
Threads can have significant memory overhead (e.g. default stack size for a single thread is 1MB on a 64bit JVM) and context switching overhead (e.g. saving register state, loading register state, impact on CPU cache/pipeline, lock contention). Creating threads on the fly tends to be expensive, so most servers use a fixed thread pool.
Therefore, with Threaded servers, the name of the game is "sizing the thread pool". If you don't have enough threads, it’s easy for all of them to become tied up waiting for I/O, preventing any new requests from being processed even though most of your threads are just idly waiting. If you have too many threads, the extra memory usage and context switching overhead become very costly. Ideally, you want the thread pool to have as many threads as the maximum number of concurrent calls your server will handle. Unfortunately, this number changes as traffic and/or downstream latency increases.
In other words, you're trying to configure a statically sized thread pool, but the proper value depends on something very dynamic (latency and traffic). Worse yet, you need to configure this thread pool on hundreds of different services, each of which talk to each other, impact each other's latency and traffic, and therefore, affect the very thread pool values you're trying to pick! This is thread pool hell.
On Evented servers, waiting for I/O is very cheap: idle requests have negligible cost, as they don’t hold up a thread. This is important, because waiting on I/O will typically dwarf all other server operations: for example, check out Latency Numbers Every Programmer Should Know and compare the time for a single roundtrip in a data center (500,000ns) to any operation within the same server (~5ns).
This means that for typical workloads, Evented servers will be able to handle far more concurrent requests than Threaded servers. However, be warned: even a single long calculation or accidental blocking I/O call can bring an Evented server to its knees. Check out Threaded vs. Evented Servers for some simple mathematical models of how threaded and evented servers behave under different load profiles.
Threaded vs. Evented programming models
In the Threaded world, performing I/O in parallel usually involves
setting up a separate thread pool (more thread pool hell!). However,
that means your I/O is now executing asynchronously, so, just as in
the Evented world, you now need a way to say "execute this
code later". At this point, you either switch to an Evented
programming style, or you bolt on some solution that feels Threaded,
such as Java
Futures (with blocking
get() calls) or continuations.
In either case, complexity goes up and flexibility goes down.
In the Evented world, all I/O runs in parallel by default and "execute this code later" is a core concept that will feel natural and consistent. The code will "scale" better, following the same patterns for both simple and complicated examples. We'll see some examples of Evented code in Play in the next several sections.
Async code in Play
The Play Framework uses an MVC
pattern, which means most of the logic for I/O will live in the
Models. They have identical
behavior in terms of async code, so for this blog post, we’ll just
Controller in Play consists of actions, or methods,
that return some sort of
The example above uses the
Ok method, which is a
convenient way to return a
Result that’s a
OK with some text.
Let’s now look at a
Controller with some asynchronous
I/O. We’ll use Play’s Web
WS) library to make a non-blocking HTTP
call to a remote web service (in this case, example.com):
WS.get() method returns a
Future[Response]. This is a Scala
Future, which represents a class that will eventually
contain the result of an asynchronous operation (in this case, the
Response from example.com). When it’s available, we’d
like to use it to build our Play
Result, but how do we go
Future to a
The first step is to transform the
Future[Result]. We can do this using a method
map (more on
Next, we need to convert the
Future[Result] into a
Result. This is easy: Play has a subclass of
AsyncResult, which can be
created by passing a
Future[Result] to the
If you run the code above, it'll proxy example.com:
It’s important to understand that this code does not block at any point. You can demonstrate this by adding a few log statements:
Most of the time, this will be the console output:
Under the hood, Play uses a thread pool sized to one thread per
CPU core. One of these scarce threads,
T1, executes the
proxy action, running through the code from top to
bottom, except the contents of the function passed to the
map method, since that depends on a non-blocking I/O call
that has not yet completed. Once
T1 returns the
AsyncResult, it moves on to process other requests. Later
on, when the response from example.com is finally available, another
T2 (which may or may not be the same as
T1) executes the function passed to the
method. At no point were either of the threads blocked waiting on the
response from example.com.
What’s this "map" thing all about?
Some people will find that using a function called
map to convert a
Future[Response] into a
Future[Result] may seem a little strange. Let’s try to
build some intuition around it.
map is actually a functional programming concept that's
easiest to understand when thinking about Lists.
If you call
map on a
List, you get back a
List where each element of the new
is the result of applying a function to each element of the original
List. For example, consider this
What happens if we
map this function over a
List of Strings?
We get back a new
List of Strings that are all lower
case. In other words, we transformed a
It’s also possible to use
map to convert a
List[X] into a List of some other type, such as a
List[Y]. For example, consider this
What happens if we
map this function over the same
List of Strings?
We now get back a new
List, where each element
represents the length of a String in the original
That is, we’ve now transformed a
List[String] into a new
Let’s look at one more case: a function called
explode that returns a
List of all the
characters in the given String:
Look at what happens if we
of Strings over
We’ve now transformed a
List[String] into a new
List[List[Char]]. What if we didn’t want to have the
nested List-in-a-List? To do that, we can use
As the name implies,
flatMap behaves just like
map, except it combines (flattens) the nested
Lists into a single, flat
flatMap functions aren’t
Lists. They are just as useful for other types
of "containers", such as
Map, and so on:
As it turns out, a
Future[T] is also a
"container": it’s a bit of a strange one, in that it
contains just one element, but the same
flatMap functions can be used to transform a
Future[X] into a
One detail that we’ve glossed over is that the
"container" class controls when it actually calls the
functions passed into
List may call the function immediately. However, a
Future will only do it when the I/O has completed and
data is available.
This is ok, because the return value for
Future.flatMap is another
can immediately transform, compose, and reason about this returned
value because it also will control when it calls any functions
passed to its
flatMap, and other
Composing futures with map and flatMap
flatMap make it easy to transform
Futures. We saw above how to transform a
Future[Response] into a
What if you wanted to fire off two
WS requests in
parallel? Here is an example that calls two search engines and
measures how long it took to get a response from each:
What if one
WS request depends on the other? Here is
how you could use
flatMap to arrange
async I/O sequentially:
Error handling, timeouts, and composition
Since non-blocking calls in Play return a
can perform certain types of composition that are harder in a
For example, you can easily add error handling to a
Future using the
recover method. This lets
you specify a function to handle the error and return a new
Future that will contain a fallback value if an error
It's also possible to pass the
Future through some
timeout-handling code, which will transform it into one that will have
the value your requested if the
Future is redeemed within
a certain time limit, or a fallback value if the I/O takes too long:
You can also compose a
Future with performance
measuring code that will transform it into one that records how long
the request took:
Finally, you can wrap all of this "instrumentation" code
into a single
Future objects instead of callbacks
makes composition easier. The "instrumentation" code above
is fairly generic and reusable, so you can add it to any app without
significantly modifying the code, without changing the signature of
every callback function, and so on.
flatMap are powerful
concepts, using them for complicated cases can lead to deep nested
structures. For example, let’s say we wanted to do 3 sequential,
non-blocking calls. With
would look something like this:
Yuck, this looks an awful lot like the callback hell we wanted to
avoid! Fortunately, there is a better way. Scala supports comprehensions, which
are effectively syntactic sugar for
flatMap. The code above can be elegantly rewritten like
Much better, right? This pattern scales nicely too, looking much the same even with twice as many sequential steps:
Ok, so how about doing the 3 calls in parallel instead? You can
use a comprehension again, but this time, call the
WS.get() method outside of it:
Woohoo, no nesting and callback hell! The code is easy to follow
and consistent whether you’re doing parallel or sequential calls. It’s
also easy to extend: we can easily combine it with the
withInstrumentation function from the previous section:
And there you have it: fully parallelized, non-blocking I/O with error handling, performance measurement, and SLAs. All in a few lines of code that are easy to read and reason about.
This is just a taste of async programming in Play. For more information, check out the documentation for Asynchronous Http Programming and Handling Data Streams Reactively, as well as the comet-clock, comet-live-monitoring, and websocket-chat example apps.
If you found this interesting, make sure to come back for future posts, where we’ll discuss async programming with Java, real time web-apps, and more.