Introducing {py}gradle, an open source Python plugin for Gradle

Pygradle1

Image Credit: Yiying Lu, Copyright: Gradle Inc.

Gradle is a build automation tool that supports many programming languages. It coordinates the process of building software between multiple different code repositories and automates a number of important related tasks, like checking dependencies and warning programmers if something they have changed might break code written by someone else. It’s already a huge success by any standard. It’s the official build tool for Android Studio, and several leading tech companies rely on it every day as a key part of their build pipeline.

At LinkedIn, Gradle is a key piece of our approach to multiproduct development. But like many open source projects that we adopt at LinkedIn, we felt that we could contribute back to the community to help make it more useful for our developers.

One of the biggest areas of opportunity at LinkedIn was in the area of language support. Historically, Gradle has provided strong support for JVM languages. At LinkedIn, we have an extreme polyglot programming environment, supporting more than 42 programming languages. One of the most important languages we wanted to integrate with Gradle was Python, the language that we use to build the vast majority of our internal developer tools.

That’s why I’m excited to announce that LinkedIn is open sourcing the Gradle plugins that we have created for building Python libraries and applications.

Introducing {py}gradle

One of the most important aspects of ensuring the success of a programming language at any scale is being able to reliably build, package, and distribute easily. The Python programming language is backed by a mature community, so it is no surprise that there has been a strong emphasis on package management for many years. LinkedIn has leveraged the work done by this community and bridged a gap between two similar but different technologies, Setuptools and Gradle, to build a powerful, flexible, and reusable Python packaging system.

At the core of Python’s existing package management system resides Setuptools—a core library focused on giving project maintainers ways to manage dependencies, encode build steps, specify output distributions, and much more. For self-contained Python applications with a small set of external dependencies, it is a fantastic tool. However, Setuptools did not satisfy all our requirements as the organization's Python footprint grew.

At LinkedIn, these are some areas where Setuptools was challenging to leverage:

  1. Dependency management: how do we specify our dependencies so that they’re explorable by external systems and teams, or so that we can do things like build automation around resolving conflicts and shepherding changes to a large number of products?

  2. Interfacing with existing metadata systems: how do we reuse the logic already created to power things at LinkedIn like code review, integration testing, deployment, and more?

  3. Polyglot builds: how do we build projects that are composed of multiple languages, such as when we have native C/C++ extension along side our Python code, or a frontend written in Ember.js, or a build system that has dependencies on non-Python tooling?

When we began our journey to address these concerns, we didn’t expect Setuptools alone to solve the problems that we were having. We also knew, however, that Setuptools should not be abandoned—it provides an immense amount of value. This is why we were careful to enhance rather than replace the existing and idiomatic Python package management ecosystem with Gradle. From the beginning, it was important for us to not tread on Python. By enhancing the Python package management system with a set of Gradle plugins, we were able to address each of Setuptools’s shortcomings while still reusing the build logic that was written for other languages. With a small amount of effort, we were able to wire in the Python package management ecosystem with LinkedIn’s rich Gradle build system, which we were already using with other languages to achieve continuous integration.

We’ve tried to make our {py}gradle plugins as idiomatic to use as possible. If you are familiar with Setuptools, using {py}gradle should be straightforward. For most of the important keyword arguments passed to the Setuptools class, we have a corresponding Gradle attribute in the build.gradle file that you can use. And, as a general principle moving forward, we’ll continue adding support for more of Setuptools’s features in this way.

Python + Gradle

A {py}gradle project looks nearly identical to a Python project that uses Setuptools. The structure looks something like the following, but may be different depending on your use of Gradle. Curious readers can familiarize themselves with more of Gradle’s core concepts by reading through their documentation.

Besides the build.gradle files, the project is identical to a Python project. Note that we nest the foo project within a subdirectory of the top-level project because a project might contain multiple sub-projects, which we’ll talk more about later.

Let’s take a look at a theoretical build.gradle file for a popular Python micro web framework: Flask. Flask is usually distributed via source distribution and has a few external dependencies on things like Werkzeug, Jinja2, and more.

This example configures the Gradle build system to build a source distribution for Flask. Some concepts, like dependency declaration, are common between Gradle and Python. We’ve decided to let Gradle manage our dependencies, so we’ve moved declaration into Gradle. We can use a simple (or complex, depending on your needs) Setuptools distribution class to join the two together. The following is a trimmed version of what a custom distribution class may look like. There is also a more complete example provided in the open source repository on GitHub.

What is happening, other than declaring our dependencies, in the build.gradle file? We apply the python-sdist plugin. The python-sdist plugin depends on a few other plugins to achieve its goal: to build a Python source distribution. It will depend on a plugin that prepares a local development environment using the Python modules Setuptools, virtualenv, and pip. Because we write tests, it’ll also prime our development environment with a few dependencies which we may want during a test, such as py.test. These are all modules that Python developers are familiar with.

Once Gradle determines that all prerequisites are satisfied, the python-sdist plugin will build a Python source distribution. It is important to note that the python-sdist plugin uses Setuptools under the hood to build this source distribution, i.e., it runs setup.py sdist. In doing this, we’ve enhanced Setuptools rather than replaced it, sidestepping issues with pkg_resource’s greedy dependency resolver while at the same time keeping the artifacts produced by {py}gradle forwards and backwards compatible with Python artifacts that were not generated with it.

Gradle uses a plugin and task architecture to let developers add, remove, and customize tasks as they see fit. This pattern lets us expose an expressive build system so that we can do things like run unit tests, generate documentation, upload artifacts to PyPI, and much more, by simply adding or customizing a task. We’re hoping that with involvement from the community, we’ll be able to imagine and implement other types of tasks and plugins that let the Python community do useful things. Additional documentation and usage information is available on our GitHub repository at https://github.com/linkedin/pygradle.

I want to now switch focus back to the motivations that prompted us to use Gradle to build our Python code: dependency management, interfacing with existing metadata systems, and lastly, building projects composed of multiple languages.

Dependency management

As the size of our Python ecosystem grew, dependency management was the single most painful task that we dealt with on a day-to-day basis. In an ecosystem interconnected by anywhere from hundreds to thousands of edges that change multiple times per day, we quickly identified that available methods for specifying dependencies and strategies for resolving conflicts were not quite powerful enough. Some of these hardships can be traced back to the fact that pip needs a dependency resolver. Currently, pkg_resources uses a greedy algorithm to select dependencies: the first specification wins. We found Gradle and Ivy to be first-class engineering tools to help mitigate these problems.

In using Gradle, we have adopted a powerful dependency and conflict resolution system, started emitting metadata about our dependencies, and begun powering entire suites of new tools that make our Python engineers more productive.

Pygradle2

To give you an example of new types of tooling that we can build with this new dependency metadata, consider a system that knows what dependent products (C, G, F) should be upgraded and tested when another product (D) is changed. This is just one example of a new type of tooling that we can build for Python products with the use of Gradle.

We used Gradle for the task it is very good at: dependency resolution. Then, we supplied Setuptools with a fully resolved dependency graph ready for build, installation, and publication. We’re hoping that lessons learned from Gradle can be shared back with the Python community in this area. If you work with the Python Packaging Authority, we’d love to share more with you.

Interfacing with existing metadata systems

We’ve identified Gradle as a powerful tool to allow engineers to build common and language-agnostic build logic. At LinkedIn, common build logic shared between language stacks includes access to product metadata, dependency resolution strategies, repository discovery and access, build metadata emission (useful for everything from testing to deployment), and much more.

In the words of Jens Pillgram-Larsen from a case study of Gradle usage at LinkedIn, “Build automation is synonymous with Gradle at LinkedIn.” As Jens points points out in the case study, continuous integration is composed of more than just building code, so much so that the term “build system” feels insufficient to describe the problem space.

By leveraging Gradle to build our Python code, we inherited all of the great work that Jens and his team have completed for other languages. With the addition of a few Python-specific Gradle build plugins, we’re able to inherit all of the infrastructure that contributes to continuous integration. This was one of the biggest wins for our Python developers as they adopted {py}gradle.

We feel like other users of Gradle will be able to achieve the same level of reuse at their own organizations, which may or may not have existing Gradle infrastructure.

Polyglot builds

Over the years our technology stack evolved in such a way that the idea of a product broke out of the confines of being written in a single language or stack. We first experienced this problem when we encountered cases where it was necessary to write native code (i.e., C/C++ code) that would provide Python bindings. This was often the case for core infrastructure services in which there were multiple other languages that needed support for the core service. In addition to this use case, we found that Python code often has native dependencies. The status quo is to have these native dependencies installed on the build and runtime machines. These undeclared and untracked dependencies are a constant source of various build and runtime issues.

Gradle helps us solve these problems by having a plugin architecture. For each language or technology stack, we simply need to apply the build plugin for the underlying language or technology stack. Plugins can be developed that mix and match other plugins: to build a Python project that uses a native extension, one need only apply the C/C++ and Python plugins. Afterwards, a first-class way of declaring a dependency between these stacks is possible, since Gradle uses a language-agnostic format (i.e., Ivy, Maven)  to declare dependency information.

Pygradle3

We’ve done other cool things using our Gradle build system with Python as well. One notable example is joining our Python build system with our Ember and Rest.li build systems. Using Gradle, we’ve been able to build a high-quality way of building backend and frontend systems that can be built, versioned, and released in unison. This complex build scenario would be challenging to model in other build tools.

Closing thoughts

The {py}gradle code that we’re releasing today has been in production use for over a year. During this time, we’ve successfully managed nearly a thousand products with tens to hundred of thousands of interdependencies, which prior to {py}gradle was an arduous process to manage. We find it easy to get Python developers up to speed with the {py}gradle build system by striving to keep the Gradle DSL that we use idiomatic to Python developers. We hope that using Gradle to build your Python code will be as simple as using Setuptools directly, but will provide you with more robust dependency management and a quality method to integrate against your existing metadata systems, in addition to allowing you to explore building products composed of multiple languages.

We also hope that other groups in similar positions are able to leverage {py}gradle to simplify their Python package management systems. Over the next year, we’re planning to further the Gradle integration with Python by adding support for wheels, multiple versions of Python, and more. We plan on releasing a 1.0 version of {py}gradle after we have gotten feedback from the community to see what matters to others.

Stay up to date with our progress at https://github.com/linkedin/pygradle.

Acknowledgements

The {py}gradle team is composed of members from our Python Foundation, Gradle, and SRE teams, namely, Stephen Holsapple, Zvezdan Petkovic, Ethan Hall, and Loren Carvalho.

We could not have released this without the contributions from LinkedIn's Engineering team, specifically Dan Sully, Paulo SantAnna, Xiaotong Chen, Dwight Holman, Andrii Tsymbala, Szczepan Faber, and more.

We also extend our thanks to all of our internal users who have helped us converge on an idiomatic and easy to use interface to {py}gradle. They, through their feedback, helped us develop a package management system that provided value to LinkedIn, and now, hopefully, the open source community.