The LinkedIn Android Data Pipeline
March 2, 2016
When we started designing mobile infrastructure for the new LinkedIn flagship mobile app last year, we wanted to completely rethink the data pipeline. The data pipeline abstracts away all the underlying complexity of data flow between the app and server, and exposes an API that is used by all parts of the app to communicate with the front end server. Since the network may be slow or unavailable, we also cache data locally to improve the responsiveness and performance of the UI.
Our design goals for the data pipeline were:
- Universal: The flagship app has a lot of different features including the feed, messaging, and search. We wanted every feature to be able to easily use this pipeline with minimal setup to maximize code reuse. To enable other LinkedIn apps to use this, we built parts of this pipeline as shared mobile client libraries.
- Intelligent: Since mobile networks can be flaky and mobile data allowances can be limited, we wanted the pipeline to make intelligent choices about when to fetch data from the network, what to cache, how to recover from errors, and so on.
- High Performance: The data pipeline forms a crucial component of the app, so raw speed is a must. In addition, given the highly constrained hardware on some Android devices, we had to ensure a minimal footprint. In order to measure performance, we also needed instrumentation hooks throughout the pipeline.
- Testable: Automated testing and continuous delivery have been some of the key themes around our new release process. The pipeline code itself needed to be testable, in addition to being configurable enough to serve up mock test data when running automated tests.
High level design
In the most use cases, UI components in the app (Activities or Fragments) request the data manager to fetch data. The data manager makes a request to both the cache and the network in parallel. If the data is found in the cache, it is returned back to the requesting component. If the network is available, a call is made to the frontend server via the networking library. The server responds with the data in the form of JSON, which is parsed and handed back to the data manager, who then returns it back to the UI component.
Our front end servers are built on Rest.li. Application code deals with models. A model is an immutable abstraction of an entity within the application; for example, a member’s profile or an item in the news feed. Model schemas are written in the Pegasus schema definition language. For example a member profile could be represented as follows:
The Pegasus language is very powerful, supporting constructs such as optionals, default values, maps, unions, lists, type references, enums, and embedded types. This rich grammar enables us to represent and model complex relationships with ease. Pegasus also enforces strict checks for backward compatibility, which takes the fear out of introducing breaking changes when evolving front end APIs. Across the wire, our models are serialized as JSON, to which we apply gzip compression to reduce payload size.
We go one step further by sharing models across the Android app and server, and auto-generating code with type-safe bindings to save developers the pain of writing redundant and error-prone parsing code.
JSON parsing and serialization can be an expensive process on Android, especially on low-end devices, due to memory constraints. Most of the commonly available open source JSON parsing libraries like Gson and Jackson ObjectMapper rely on reflection to parse annotations at runtime. Reflection can be intolerably slow on Android, and the app pays a significant one-time penalty when these libraries encounter an object that has never been serialized or deserialized before. JSON parsing commonly happens during app startup, so this slowness ends up severely impacting app launch time. These JSON parsing libraries do a large number of intermediate memory allocations during parsing and maintain the model structure in memory (to avoid incurring the reflection penalty again) even after the parsing is complete. This adds to the already severe memory pressure on low-end devices.
A solution based on Jackson stream parsing solves these issues and is easy to maintain since we auto-generate all the complex stream parsing code. For example, the generated code to parse the member profile model described above would be:
We generate similar code for serializing the model as well. Since we generate code, we have the flexibility of doing some Android-specific customizations like using ArrayMap for maps, adding nullability annotations for better null-checking, generating equals() and hashCode() to enable models to be put safely into maps and sets, and so on.
Finally, since our objects are immutable, we generate public final member variables for our fields. Generating fields instead of getter methods helps us minimize method count (to avoid hitting the dreaded 65k method limit), and also makes application code faster.
Maintaining consistency across different screens of the app is key to a good user experience. It would be very disconcerting, for example, if you clicked on a post in the feed, liked it in the detail view, came back to the feed, and did not see the like reflected on the feed item.
In the past, developers have manually anticipated and coded for specific scenarios like this. This meant more manual work for developers, with the risk of introducing subtle bugs by missing corner cases. Since this was a pattern we observed across the app, we decided to use an approach validated by Facebook:
- The consistency manager helps synchronize changes between models being used by different parts of the app via change notifications.
- Change notifications are recursive. For example, if model A contains model B and model B has changed, listeners of model A will be called.
- Consistency bindings are code generated to ensure that complex object relationships are not manually modelled by developers or computed at runtime via reflection, which can be expensive.
Traditional Android app caches store server responses as key/value pairs with the key typically being the URL, and the value being the response payload. While this approach works well for simple apps, it falls short for our purposes.
The same model may be nested in two identical responses, so if we cache at the response level, we lose consistency. Additionally, the same data can be cached multiple times, wasting precious storage space. Our internally developed “fission cache” gets around this limitation in the following ways:
- Responses are trees of models; the tree is flattened and each sub-model is cached individually. This reduces cache duplication, increases cache hit rates, and enables retrieval of cached sub-models that do not have their own server endpoint.
- When responses are queried from the cache, then individual sub-models are queried, the model re-assembled, and the model is returned back to the UI layer.
- Since JSON parsing is expensive, models are serialized in a custom binary format in the cache.
- We use Least Recently Used (LRU) eviction to ensure that the cache does not grow unboundedly.
- Similar to consistency bindings, caching bindings are also code generated.
Our in-house networking library is a wrapper around Volley and OkHttp. We chose Volley for its excellent queue management and prioritization support and OkHttp for its excellent performance and configurability. We’ve made several customizations to our networking library including:
- Forking Volley to add streaming support. This is invaluable in conjunction with the stream parser described above, since it enables us to start parsing the response even before it has finished downloading completely.
- Low-level network metric instrumentation hooks. This enables us to measure DNS lookup times, TCP connect times, connection drop rates/timeout rates etc.
- Using a persistent SSL session cache to avoid unnecessary SSL handshake roundtrips.
- Dynamic connection pool size tuning based on device and network conditions.
- Ability to mock a web server returning test data during automated tests.
This component manages and acts as a proxy for data access from the network and cache.
- Allows request filtering for network, cache, or both.
- Supports request cancellation by precedence: If the network returns before the cache, the lower-precedence cache request can be canceled.
- Pools requests to prevent duplicate requests from executing at the same time.
- Supports aggregation of multiple requests into a single container request to minimize network round-trips because the underlying protocol does not support native multiplexing, and in order to handle read-after-write scenarios.
- Propagates data changes to the Consistency Manager, which in turn fires change notifications to notify any interested listeners.
This component acts as the bridge between the app code and the data pipeline libraries. Provider manages the model data needed by a feature and handles listening for updates from the Consistency Manager.
Building this fast and robust data pipeline has helped us improve both the speed and stability of our app, while increasing developer productivity. We plan to continuously improve this pipeline, promote it across all our other apps, share our learnings with the wider mobile development community through blog posts like this one, and open source some of these components soon.