Open Source

Using virtual private clusters for testing Apache Samza

If Apache Kafka is the lifeblood of all nearline processing at LinkedIn, then Apache Samza is the beating heart pumping that blood around. Samza at LinkedIn is provided as a managed stream processing service where applications bring their logic (leveraging the wide variety of Samza APIs), while the service handles the hosting, managing, and operations of the applications. Applications run on managed clusters, equipped with a cluster manager such as YARN, Kubernetes, or Mesos.

The number one priority for such a managed service is stability, while providing the usual desirables (new features, upgrades, improved guarantees, among others). The go-to way of ensuring this is rigorous and comprehensive testing. However, comprehensive testing at a cluster (or higher) scale—be it for failure testing, testing for rolling out upgrades, or feature development—is often cumbersome and time-consuming.

What if there were a way to perform cluster-scale testing from the comfort and confines of your developer machine? At LinkedIn, we use Linux Containers (LXC), emulating a realistic YARN cluster, to be used for testing new features, automated failure testing for Samza, YARN upgrades, and YARN-Samza interactions.

Docker, LXC, and the art of OS-level virtualization

Similar to Docker, LXC is an OS-level virtualized solution that uses Linux namespaces and cgroups to create multiple isolated Linux virtual execution environments that share the same kernel, or “chroot on steroids.” In fact, early versions of Docker used LXC. OS-level virtualization benefits from lower virtualization overhead, because of which it offers higher density, compared to full- or para-virtualization approaches.

This means one can create multiple Docker or LXC instances on a machine, have them participate in a YARN-managed cluster, and run and test applications (like Samza) over this setup. YARN simply sees the Docker or LXC instances as “machines.” All multi-machine code paths get exercised and tested, while the low virtualization overhead allows for great density. We’ve been able to create YARN clusters of 10 to 100 “machines” on a single commodity physical machine.

That said, Docker supports using only a single process per Docker instance, although more recently that has been termed a best practice rather than a strict limit. Currently, LXC is our choice for the virtualization solution, but we will explore Docker soon.

Setting things up

This entire setup has been automated and is available here, and works in conjunction with samza-hello-world. The figure below provides an overview of our setup that uses LXC to emulate a YARN cluster on a given host (called base-host).

setup-that-uses-LXC-to-emulate-a-YARN-cluster

Networking
We set up a private subnet for the LXC based virtual-private-cluster. In this, the base host has a virtual network interface (using Linux’s libvirt) and a private IP (e.g., 192.168.9.1). Similarly, all LXC instances use a virtual network interface with a similar private IP, all connected to a virtual subnet with the base host acting as the gateway. In addition, all LXC-instances are source-NATed. This allows them to talk to the internet, particularly useful for installing or upgrading packages within the LXC instance.

YARN and deployment
Each LXC instance runs a YARN NodeManager (NM), which is networked to a YARN Resource Manager (RM) running on the base host, along with Apache Kafka and a Zookeeper instance. In addition, we use a shared directory on the base host that is mounted within each LXC instance. This shared directory serves as a virtual repository to distribute application binaries to the LXC instances using Samza’s package.path config.

The end result is a YARN cluster with LXC-instances acting as hosts, sized to over-subscribe and hence statistically multiplex the base host’s resources (e.g., a cluster of 50 LXC instances of 8 GB each on a base host with 64 GB of RAM).

cluster-nodes-example

Making the network realistic

Apart from the setup, we also need to emulate the network to ensure that our testing reflects the characteristics of the real network, such as bandwidth limit, network delay, packet loss, etc. Given the private-subnet based networking for the LXC instances (above), we use Dummynet to set up bandwidth limits between LXC instances, and add network delays that represent typical inter-data-center latencies and packet loss. For example:

example-network-delay

This adds a bandwidth limit of 100 Mbps with a 10 ms delay and 1% randomized packet loss on “pipe 1,” which connects LXC-instance-0 to the gateway. Similarly, queuing delays can be added by specifying the queue size for “pipe 1.” This article from the Association for Computing Machinery describes Dummynet in greater detail.

Putting it to use

There are a number of use cases for which we’ve found our setup useful because it avoids the overhead of using a real cluster (e.g., setup, coordination, etc.). Our use cases mostly fall under the three types: validation, failure testing, and development.

Our most common use case is validating YARN semantics and performance before adopting an upgraded version. This is especially important when running Samza as a managed service to ensure that features like Samza’s host affinity have not regressed. This setup allows us to carry out controlled runs of Samza jobs and validate different host-affinity related metrics.

Another class of use cases is failure-testing. In the case of machine failure, testing Samza behavior is much easier because it can be triggered by simply issuing a command to stop an LXC-instance. Container failures are tested similarly, while network partition behavior is tested seamlessly by setting the bandwidth of an LXC-instance to 0 (using Dummynet as described above).

Lastly, this setup also improves the developer experience when testing features that require multi-machine interactions, such as Samza’s standby containers, host-affinity, startpoints, and partition expansion. We’re working on extending this approach to test and validate other distributed systems which bear similar requirements.

With distributed systems being increasingly offered as “managed services,” conducting realistic, rigorous, yet easy testing continues to be a problem today. Hopefully, our approach finds a use in the distributed systems that you build and manage.

Want to work on similar problems in large-scale distributed systems? The Streams Infrastructure team at LinkedIn is hiring engineers at all levels! To learn more, check out all of our open roles on LinkedIn.

Acknowledgements

This work wouldn’t have been possible without the valuable help and contributions from Prateek Maheshwari, Ahmed Abdul Hamid, Jagadish Venkataram, and Sanil Jain.