Q&A with Jim Brikman: Splitting Up a Codebase into Microservices and Artifacts

Karan Parikh

Human. Software Engineer.

February 24, 2016

Yevgeniy (Jim) Brikman is a software engineer, entrepreneur, and author.

As a software engineer at LinkedIn, he helped build the company's infrastructure, APIs, Recruiter product, hackday program, open source program, and the Engineering Blog you’re reading now. Since leaving LinkedIn, Jim has founded a company called Atomic Squirrel, which helps startups build and scale their infrastructure. He also wrote a book called Hello, Startup, which is a comprehensive guide to building products, technologies, and teams in a startup environment. In it, he uses his experience and interviews with programmers from many successful startups that are now tech giants, such as LinkedIn, Google, Facebook, and Twitter.

We talked to Jim about his new book and his experience at LinkedIn. Read on for our Q&A, where Jim offers best practices for splitting up a codebase into microservices and artifacts and describes how his experience working at LinkedIn during its period of hypergrowth helped him better understand how to code at scale.

The worst thing that can happen to a codebase is size. The more code you have, the slower you go. For example, consider the following chart from Code Complete, which shows project size (lines of code) versus bug density (number of bugs per thousand lines of code):

table
Project size Lines of code	Bug density Bugs per thousand lines of code
< 2K	0 - 25
2K - 6K	0 - 40
16K - 64K	0.5 - 50
64K - 512K	2 - 70
> 512K	4 - 100

What this means is that as your codebase grows, the number of bugs grows even faster. If your codebase grows by a factor of 2, the number of bugs in it could grow by a factor of 4 or 8. And by the time you’re working with more than half a million lines of code, bug rates can be as high as one bug for every 10 lines of code!

The reason for this, to borrow a quote from Practices of an Agile Developer, is that “Software development doesn't happen in a chart, an IDE, or a design tool; it happens in your head.” A codebase with hundreds of thousands of lines of code is far beyond what you can fit in your head. You can’t consider all the interactions and corner cases in that much code. Therefore, you need strategies for splitting up the code so that you can focus on one part at a time and safely ignore the rest.

There are two main strategies for breaking up a code base: one is to move to artifact dependencies and the other is to move to a microservice architecture.

The idea is to change your modules so that instead of depending on the source code of other modules (source dependencies), they depend on versioned artifacts published by other modules (versioned dependencies). You probably do this already with open source libraries. To use jQuery in your JavaScript code or Apache Kafka in your Java code, you don't depend on the source code of those open source libraries, but on a versioned artifact they provide, such as jquery-1.11-min.js or kafka-clients-0.8.1.jar.

You can use the same strategy with your own code. For example, at LinkedIn, when all of our code was in a single repo, most modules depended directly on the source code of a few infrastructure libraries, such as the home page code depending on util-servlet and security-identity, as shown in Figure 1. As developers worked on the util-servlet and util-security modules, they occasionally made changes that broke the home page.

To move to versioned dependencies, we modified the build for the util-servlet and util-security modules to publish versioned artifacts (util-servlet-1.0.3.jar and util-security-3.0.4.jar), and we modified the build for the home page to depend on these artifacts, as shown in Figure 2. In fact, once we started using versioned artifacts instead of source dependencies, we were able to move the util-servlet and util-security modules into separate repositories.

The main advantage of using artifact dependencies is isolation. Since the home page is now set to fixed versions of the util-servlet and util-security modules, the changes developers make to those two modules will have no effect on the home page until the home page developers explicitly choose to upgrade to a new version. This allows developers to focus on one small part of the codebase at a time without worrying about the rest. Splitting up your code this way also encourages decoupling, as it forces you to explicitly define your dependencies and public APIs. Finally, it’s faster to build a small repository with a single module than a monolithic repository that has all of your modules.

However, there are some drawbacks too. One of the biggest ones is commonly called dependency hell. For example, imagine as before that the home page depends on util-servlet version 1.0.3 and util-security version 3.0.4, but now util-servlet also depends on util-security, and it wants version 4.0.1. Which version of util-security will you actually get? Will 4.0.1 work with the home page? Will 3.0.4 work with util-servlet? This is just one flavor of a dependency conflict. You can also run into problems with circular dependencies, diamond dependencies, or even from having too many dependencies or very long dependency chains.

Another issue is that the isolation benefits you get from versioned dependencies can be a double-edged sword. If developers make backwards incompatible changes to util-servlet and util-security or introduce bugs, you won’t find out about it in the home page until you upgrade to a new version, which might not be until months after the broken change went in, at which point it may be very hard to fix the problems. Moreover, with multiple repositories, making global changes is difficult and cannot be done atomically. Imagine you found a major security hole and had to patch it in all of your repositories. You now need to find a way to search across all your repositories for the security problem, check out each repository that matches your search, update code and dependency versions in each repository, and then try to commit your changes back—all while dealing with dependency hell.

In general, if your code consists of a number of isolated modules and the majority of your changes are within individual modules (i.e., your company resembles a number of separate open source projects), then multiple repositories with versioned dependencies will allow you to go faster. However, if you regularly need to make global changes across many modules, then a single repository with source dependencies will be the better option.

Like most startups, LinkedIn started out with a single monolithic app where all the modules ran in the same process and communicated with each other via function calls. For example, if the home page module needed profile data, it would call a function in the profile module; if the profile module needed company and email data, it would call functions in the company and email modules; and so on, as shown in Figure 3.

Figure 3: A monolithic architecture uses function calls within a single process

As traffic to our site grew, and as the number of employees making changes to the modules grew, this approach did not scale well, and we broke out the modules into separate services. Each service runs in a separate process, usually on a separate server, and communicates with other services via messages, as shown in Figure 4. There are many different models for building a tech stack out of standalone services, including service-oriented architecture (SOA), microservices, and actor systems.

Figure 4: A services architecture passes messages between processes

Like the artifact dependencies we discussed previously, services provide a certain degree of isolation, as you can focus on a single service at a time. In fact, service boundaries work well as code ownership boundaries, which allow teams to work independently from one another. This is important for dividing up the work in a growing company, such as allowing the profile team to focus on the profile service and the email team to focus on an email service.

The fact that services run as separate processes, and usually on separate servers, opens up two other powerful advantages. The first advantage is that services are technology-agnostic. If all of your code is in a single process, you have to write it all in the same programming language and with the same libraries. But with services, you could write one service in Ruby, another in Java, another in Python, and so on. The second advantage is that you can scale each service separately. One service may be horizontally scalable, so you distribute it across many small servers, but another may only be vertically scalable, so you put it on a single server with a powerful CPU and lots of RAM.

However, services also come with a number of drawbacks. The biggest one is a massive increase in operational complexity. Instead of having just a single type of app to deploy, you now have many different types, possibly written in different languages, with different mechanisms for deployment, monitoring, configuration, and so on. Moreover, to allow services to communicate with one another, you will need to deploy new infrastructure (e.g., a load balancer or ZooKeeper) for service discovery and routing.

While services may help with scalability, they can actually hurt performance. This is because remote calls, even within a data center, take several orders of magnitude more time than local function calls (not to mention the CPU and memory overhead to serialize and deserialize remote requests). Therefore, you have to re-organize your code to minimize this overhead using caching, batching, and de-duping. In fact, dealing with remote calls means you’re either using blocking I/O and you have to manage thread pools, which increases operational overhead and can cause cascading failures in your data center, or you’re using non-blocking I/O, and you have to use a more complicated coding style based on callbacks, promises, or actors. Moreover, you have to re-organize your code to handle a whole new class of errors: while a call to a local function always succeeds, a call to a remote service can fail because of network problems, because the service is down, or because it might simply take too long.

It’s also worth noting that, if your service boundaries extend down to data storage boundaries (which they usually do), using services means you are sacrificing transactions, consistency, and referential integrity. That is, if each service has its own database—the Profile Service stores data in the Profile Database, the Company Service in the Company Database, the Email Service in the Email Database, and so on—then it’s very difficult to make a change across all of those data stores atomically. For example, let’s say you got a job at a company called “FooBar, Inc”, and you added that to your LinkedIn profile. The Profile Service would need to store the Company Database id of FooBar, Inc in the Profile Database. Well, what happens if FooBar, Inc. is deleted from the Company Database? There is no referential integrity across databases, so if you’re not careful, you’ll have an invalid id in the Profile Database. And even if you manually write code to remove FooBar, Inc from both databases, you can’t do so atomically, so if one of those deletes fails, you’ll again have inconsistent data.

Finally, while services do provide isolation for the internal implementation details of the service, the public API of the service can actually be harder to maintain, especially if you want to make a backwards incompatible change (e.g. delete an API or rename a parameter). With local function calls, you can change a public API and update all the code calling it in one step. But with services, you have to first add a new API, then track down all the clients that call the old API (which isn’t always easy and may require digging through access logs), find the codebase for each client, update the clients to call the new API, deploy each client, and only then can you finally delete the old API.

In general, services entail a massive overhead, so you're best off avoiding them until you have no other choice. That is, if your company is getting big and you need your teams to be able to work independently from one another, or if you need to use different technologies for different problems, or if a single monolithic app simply can't handle the load anymore, then you should move to services. But keep in mind that this will require a heavy up-front investment to deal with deployment, monitoring, I/O, and service discovery.

Check out the “Software Delivery” chapter of my book, Hello, Startup as well as my recent talk, Agility Requires Safety. And, as always, feel free to reach out to me on LinkedIn!

Topics: Code Infrastructure