Building a blazingly fast Android app, Part 2
January 7, 2020
Monitor, profile, optimize, and ramp. This is the cycle of performance operations that we rely on to improve our Android app. Part 1 of this blog series covered monitoring and profiling — monitoring gives us the “true north” of optimizations to make data-driven decisions, while profiling helps us pinpoint the areas of opportunity. In part 2, we will look at specific optimizations that have been applied to our Android app.
Please note all site speed metrics in this part are reported at the 90th percentile.
The low-hanging fruit for optimization
Delaying expensive object initialization (-300ms app startup time)
During the launch of the app, a third-party library is synchronously initialized to collect data for PoP and CDN steering. This library creates an invisible WebView. In recent Android versions, creating a WebView could be a heavy operation since it is powered by Chrome in a separate process. Given this library is only used for collecting data and not critical for displaying UI, we solved an inefficiency by delaying the initialization of this library for 10 seconds.
Optimizing class loading (-400ms app startup time)
Dagger dependency injection initialization takes about one second to perform an injection into our application. There is a compiler option fastInit, which when activated, no longer loads classes for all bindings. When initializing a component, it only loads classes when the dependency is injected.
Testing this change was very challenging. Using A/B testing by switching between two copies of dependency injection code would have added risks of increased build time, increased app size, and confusion in duplicate code. Instead, we leveraged Google Play Public Beta to roll out this change to a small set of our members, and monitor our app health closely to ensure that there was no increased crash rate or negative impact on metrics impact.
Lazily initializing objects (-70ms app startup time)
LogoutManager handles the logic when our members attempt to log out. It needs to load and initialize several dependent classes upon initialization, which is quite a heavyweight task. Given that members rarely log out from the app, we decided to add a Lazy<> qualifier to ask Dagger to lazily load and create the LogoutManager object.
Optimizing low-level calls (-50ms page load time for various pages)
For our data models, the default implementation of hashCode() iterates over all fields recursively. Some data models are immense and their hashCode() method stands out in the local profiling. Our app infrastructure code creates a map to associate the data model with its metadata. A typical page load fetches 10 items in a collection, and the accumulation of hashCode() becomes a noticeably hefty collection of operations in the main thread.
We performed an aggressive optimization on hashCode() by computing only the hash value of its id(), rather than all the fields. We went into this optimization aware that it could fail when two data models in a map hold the same value of id(). In other words, when they are in fact the same backend entity. However, such model duplication is rare if our server paginates correctly. Even if it happens, duplicate data models will be associated with the same metadata. Therefore, we decided that this optimization does not affect correctness.
Optimizing our app flow
The app flow of rendering a page
Let’s first take a look at the high-level flow to render a page using architecture components from Jetpack in our LinkedIn Android app:
LiveData in the ViewModel holds the logic of fetching data from the repository, transforming it into view data. The fragment observes the LiveData on the main thread. Upon onChanged(), fragment inflates the views in recycler view and binds the view data to views. Android OS then takes care of rendering the views on the device in a render thread.
When our members launch the app and land on a page, a fragment is brought to life. When onStart() moves the lifecycle state to STARTED, LiveData becomes active and we begin to fetch the data from the disk cache repository. When we get the response from cache repository, we perform two operations in parallel:
transform, bind, and render the cache data on UI
fetch the data from the server repository
When getting the response from server, we also perform two operations:
transform, bind, and render the server data on UI
write the data to disk cache
Fetching data from disk cache is often faster than fetching the data from the server. Caching particularly benefits devices with low network bandwidth. While the app waits for the network response from the server, the member can still interact with the app with cached data.
Proactive asynchronous view inflation (-115ms page load time for notifications page)
View inflation is a heavy operation on the main thread that creates views in memory from XML layout files. If we display 10 items in a recycler view at the same time, like in our notifications page, we need to synchronously inflate the layout once per item in the main thread, making things worse. On the other hand, view inflation happens after we get the server response in a traditional implementation. Regardless of what the server response looks like, we can forecast which of the XML layout files to be inflated. The question our team came across was: why don’t we proactively inflate those views while waiting for the server to respond? By the time the recycler view receives the view data to display, the layouts are already put in the recycler view pool and ready to use!
View inflation optimization
AsyncLayoutInflater is now brought into play. It allows us to inflate views on a specific separate thread. Once we kick off the request to fetch data from the network, we can use AsyncLayoutInflater by forecasting the relevant XML layout files and how many of them need to be inflated asynchronously.
We specifically applied this technique to the notifications page because it only shows network data and no cache data.
Parallelization of server and cache requests (-208ms page load time for the “My Network” page)
In the existing app flow, the server request is kicked off after the cache request starts in serial. Since cache requests are often handled faster than server requests, we realized we could reduce the page load time further by parallelizing the two together. With this parallelization, we still need to pay attention to the nuance of not overriding server response with cache response.
Parallelization of requests
Preload data before onCreateView() (-102ms page load time for the “My Network” page)
LiveData will be active if its lifecycle is in the STARTED or RESUMED state (see documentation here). In our existing app flow, fetching the data is implemented inside LiveData.onActive(). In other words, the data fetch starts immediately after Fragment.onStart().
As we stated earlier, view inflation is an expensive operation. The implementation Fragment.onCreateView() inflates the container view for the entire fragment. A fix for this is to potentially preload the data earlier than onCreateView() by moving it to onCreate().
We made the choice to use onCreate() instead of onAttach(), mainly because the former has a parameter of type Bundle? as the saved instance. We may have delicate logic based on what is saved inside the bundle. Through local profiling and reading the source code of FragmentManagerImpl.moveToState(), the time difference between those two callbacks is a single-digit millisecond.
A successful optimization of our code will result in a local reduction in app startup or page load time. However, these numbers are often specific to the environment we are developing, whether it is a local device or emulator. It does not always reflect the actual impact on our members. In order to make data-driven decisions at a large scale and impose risk control (in case our optimization sacrifices the correctness), we leverage our A/B testing platform to control the ramp of the optimization experiment. You can learn more in our past blog on monitoring site speed with A/B testing and feature ramp-up. It’s important that we are able to quantify the impact of a particular optimization experiment on page load time; our analysis report helps us decide if we want to roll back or continue ramping certain efforts.
We have also encountered a few cases where A/B testing was technically challenging. For example, we optimized the code generation of Dagger dependency injection and A/B testing required duplicating an enormous amount of code. In such cases, we judiciously use Android beta programs provided by Google Play to verify the correctness and compare week-over-week metrics to quantify impact.
We have come a long way to realize the importance of performance optimization. Thank you to the following individuals:
Nathan Hibner, who spearheaded the effort of profiling our Android app. He came up with the ideas and implemented several optimizations mentioned here: lazy initialization of LogoutManager, delayed initialization of third-party library, and view pool optimizations mentioned in this blog. A few optimizations not only reduced the site speed numbers, but also had a significant impact on product metrics.
Hongbin Huang, who analyzed and optimized the launch stack of app as well as implemented several optimizations, including lazy initialization of LogoutManager, delaying initialization of our disk cache for address book, and a large-sized refactoring of merging two startup activities into one.
Ramya Pasumarti, Nithesh Rangaswamaiah, Prasanna Vijayanathan, Dan Zitter, Ritesh Maheshwari from Performance team, and Bryce Pauken from Flagship Infra iOS team, who spent a tremendous amount of time revamping our monitoring instrumentation and tools.
Jason Su, who participated in the early discussion of revamping the monitoring instrumentation. We also laid out the quarterly plan as a continuation of the effort to optimize our app startup.
Aaron Armstrong, who supported me by planning objectives and key results to make sure we invested in optimizing our runtime performance.