Sleek and Fast: Speeding Up your Fat Web Client

October 19, 2017

Editor’s Note: This post is based on Sarah’s talk “Sleek and Fast: Weight Management for your Fat Web Client” given at the Grace Hopper Conference this year in the Software Engineering track.

The need for speed

In the fall of 2016, our flagship engineering team completed an initial feature build-out for the new LinkedIn desktop site built on our new Pemberly architecture. The site was released to the company internally, and most users felt it was 2-3x slower than our existing website. Before releasing it to the public, we knew it had to be at least as fast as our existing site. This realization began a journey for the infrastructure team, along with the application team, that was akin to a competitive athlete getting sleeker and faster.

We were shifting from a server-rendered architecture with a sprinkling of JavaScript to a client-rendered, long-lived, stateful web application. The benefit of this choice was that we could build a user experience that was unified with the experience of our native mobile clients. The difficulty was that our team was in the middle of a foundational technology shift that had entirely different performance characteristics. We had moved from performance that was largely driven by the ability of our service fanout to deliver data to the web application server quickly to a situation where we had concerns of being network-bound, CPU-bound, and memory-bound. We needed a new approach to deliver a performant user experience in this new paradigm.

In order to achieve high speeds, we needed to: understand what our goals were and how to measure them; go through a challenging process of becoming sleek and fast; and develop team habits and consistency to continue improving over time.

Goal-setting and measurement

To understand what constituted success, we had to understand where we were going. Success was defined by the following three milestones:

The new site had to be at least as fast as the existing site in order to be released to the public.
The launch of the single page application (SPA) had to be as fast as the load of the traditional server-side-rendered (SSR) site.
The new site would have launch and subsequent page views that achieved a gold-standard time set by our Velocity team, which is responsible for site speed excellence in our applications.

Once we understood what the determinants of success were, we needed to figure out how to measure ourselves. Our first challenge came in determining the measurement of “as fast as the existing site.” Fundamentally, a SPA cannot be measured by the browser’s built-in API for window.load like an SSR web page can. So, to even gather real user metrics (RUM), we needed a new type of instrumentation. However, in a SPA, the initial launch time is much longer than subsequent page views post-launch, while in an SSR web site, each page load time is relatively consistent. We needed to figure out an apples-to-apples comparison of these two types of websites. Ya Xu, a principal staff engineer on our Data team, in collaboration with Ritesh Maheshwari from our Velocity team, helped us develop what we call a “Session-weighted p90” methodology in order to make this an apples-to-apples comparison.

The first component of the Session-weighted p90 is the “combined site speed” for a page X, described in the equation above.

Given a page i, we can get:

n_i page views for session-launch (first page view in a session)
m_i page views for session-subsequent (subsequent page views in a session)
P90 page load time for session-launch as L_i
P90 page load time for session-subsequent as S_i

“Overall site speed” X is then calculated for the top k pages as the weighted average of individual site speed X_i, weighted by the page view volume of each page using one of the equations below.

Once our RUM measurements and Session-weighted p90 were established so that we would know when we were at least as good as our existing site, we were almost ready to start the hard work of becoming sleek and fast.

Knowing that RUM metrics and Session-weighted p90 were only going to be available once a daily Hadoop job completed, and that data from our beta group of internal employees was a bit noisy due to lower volume, we decided to set up some leading-indicator metrics to help us understand how we were tracking on a per-commit basis. For the leading-indicator metrics, we needed to look at things that would contribute to both the network-bound problems and the CPU/memory-bound problems. We decided on the following metrics to track on a per-commit basis:

Uncompressed CSS size for the app
Uncompressed JS size for the app
Uncompressed compiled template size for the app (which is also just JavaScript)
Number of AMD modules in the app
Number of CSS selectors in the app
CSS selector length score
CSS selector complexity score

Becoming sleek and fast

Once we had our measurements established, it was time to begin the hard work of getting our application to be faster. The work would cut across framework and application code, and had to be prioritized. If not for the tireless work of our framework engineers, as well as engineers across the entire application team, success would have been difficult to achieve.

There were two significant framework-level changes we wanted to achieve in order to make our application faster. First, we had to split the application into several mini-applications, or engines. We did the work in Ember CLI to enable lazy loading of engines and their assets. Additionally, we completed work on the Glimmer rendering engine for Ember, allowing for greatly reduced template size over the wire, and transformed the wire format to a data structure, which cut down on JavaScript parse/eval time. Lazy engines reduced asset size at load time by 50%, and Glimmer reduced uncompressed template size by 40%.

In the application, we made significant changes to the CSS layer. The CSS had not been structured before the start of the project, so CSS was located haphazardly in the directory structure, and often repeated unnecessarily. Additionally, our liberal use of Sass led to significantly more verbosity than was needed to achieve the styling of the application. We decided to employ a BEM (block element modifier) architecture for the application. We also collapsed all frequently used classes for our UI pattern library to a single file to eliminate duplication. The result was a reduction of 2mb in uncompressed CSS size. Additionally, we reduced the selector count by 10,000 and the selector complexity score by 90%.

Another significant area of opportunity in terms of run-time costs was the view layer within Ember. We found that we had a deeply-nested component hierarchy. We decided to move to a flatter compositional model instead of an inheritance model. This is frequently a performance bottleneck in client-rendered, component-based applications, so we decided to eliminate this aspect. We were also able to eliminate dead code paths, which reduced our uncompressed JavaScript and uncompressed template size.

After addressing the view layer, we tackled some issues in the data layer. We were over-modeling and doing too much work in the client due to an early decision to code-gen models/adapters/serializers based on the entire surface area of the API server represented as Rest.li PDSCs. Additionally, we had decided to persist every model instance in memory based on the data returned from the API.

In order to incrementally remove the performance penalty of these prior design decisions , we made a couple of immediate changes. First, we reduced the number of records we would fetch at one time, which not only reduced the size of data returned, but also reduced the number of model instances in the client memory in Ember Data. An example of this would be fetching data for six feed items instead of 20 at initial render time. The second change we made was to reduce the number of distinct types in the system. We found that many collection types were structured so similarly that we could collapse all the model collections to a single collection type. These changes greatly reduced the number of AMD modules, as well as the run-time costs in the application.

In-memory graph for a single feed update before our changes

Habits, consistency, and maintenance

Oftentimes, people want a quick fix for athletic performance, like a juice fast or crash diet. Similarly, a quick fix is often desired for a web application that is performing non-optimally. However, real performance, whether it’s athletics or web performance, is about making a lot of small, good decisions consistently.

For our web application, we needed to create ways for our engineers to achieve real results that would also be lasting, and without unintended side effects. So, with the help of our Velocity team and Sreedhar Veeravalli, we developed a couple of libraries to enable good decisions.

The first was a library for doing occlusion culling. Occlusion culling is a technique that came out of the video gaming industry where you render what is actually visible at a given time. Using a holistic framework like Ember gave us the power to effect occlusion culling for any of our pillar teams that could benefit. A good example is that we retrieve six feed updates, but on most screens, only the first two would be visible initially. So, using occlusion culling, we can defer the rendering of the other four updates until they come into the viewport.

How occlusion culling interacts with the viewport

The second library we made available was a scheduler. Many teams had what they considered to be less important content, even within the viewport, that could be deferred until the initial view was fully interactive. They had been employing various hacks, like timeouts, that were set to be longer than what they thought was the longest reasonable load time. However, in some cases the deferred content was interfering with the initial render and causing re-renders as it was injected into the DOM. So we created a scheduler for a logical queuing of deferred content—this was content that was guaranteed to load after the initial render was complete.

Finally, the framework team needed some ways to develop habits of continuous performance improvement. In order to achieve this, two tools were developed.

The first is a library called Heimdall, created by Stefan Penner, David Hamilton, and Chris Thoburn. Heimdall enables counting operations inside the framework internals. Additionally, it allows for measurement of time for each operation. Certain parts of Ember were instrumented to allow the team to identify areas of interest as the framework was exercised by the application code. This helped the team identify and prioritize bottlenecks both in the framework and the application.

The second toolset is called Ember Macro Benchmark. It combines Chrome tracing and HAR Remix, another library developed by Kris Selden, who created the tool. These tools allow the team to isolate the client code away from the server response and then trace interesting marked events in the browser against the application code. The team can then benchmark different versions of the framework alongside the application code to see how the workload changes and if overall render time is improved.

Several versions of Ember.js benchmarked using Ember Macro Benchmark against emberaddons.com

As with any good weight management plan, we needed to enable the application to continue to improve performance, and avoid any regressions. This meant putting some defenses into the system. The defenses were a series of tools for the pre-commit phase, the post-commit/pre-deploy phase, and for product teams to evaluate site speed post-feature-release. In the pre-commit phase, we added code linters, which would catch anti-patterns, like violating data flow direction, memory leaks, private API usage, and over-nesting of components. Additionally, we put in a hook that set hard limits on asset size growth and absolute asset size for different parts of the application.

A daily asset report on a day when a lot of new code was checked in

In the post-commit/pre-deploy phase, we added a run of a tool created by our Performance Apps team called Heliosphere. Heliosphere is intended to catch performance regressions of 100ms or more with confidence intervals. A heliosphere report is created with each commit. Additionally, we create a report of asset size changes with each commit, so any commits that substantially increase asset size but fail to trigger the hard limits are understood.

Finally, our Velocity team started exposing the site speed deltas associated with A/B tests on our A/B testing platform, XLNT, in the view of the experiments. This enables product managers and engineers to work together to understand tradeoffs in feature engagement and site speed.

Our XLNT dashboard view of site speed impacts of an experiment

Winning

As a result of the framework-level and application-level changes, as well as the libraries and tools helping the teams develop performance-first habits, we doubled the speed of our web application prior to general availability. And we exceeded our targets for Session-weighted p90. Our weight management plan succeeded in preparing us to be fast.

There were many more work items identified for our backlog than we could complete prior to releasing the new version of our main site. However, the teams have continued the hard work of burning down that backlog in the months since general availability. As a result, we’ve seen a further 40% improvement in site speed over the past few months. And now, we have achieved our second milestone, which is that the initial launch of our web application is faster than our old site’s page load time. We are happy to say that our gold-standard time for our site speed is on the near horizon, and we have the tools and the discipline to help us get there.

Site speed trend since launch

Acknowledgments

This work would not have been possible without the tremendous dedication of our infrastructure engineers, the open source community, our Velocity engineers, and the entire Flagship Web engineering team. Kudos to Branden Thompson, Stefan Penner, David Hamilton, Chad Hietala, Kris Selden, Rob Jackson, Marc Lynch, Steve Calvert, Chris Eppstein, Will Hastings, Asa Kusuma, Mark Pascual, and Zack Mulgrew, who embedded with the Flagship team during this process. And, big thanks to our Performance engineers: Ritesh Maheshwari, Michael Butkiewicz, Sreedhar Veeravali, and Prasanna Vijayanathan. And, of course to Marius Seritan, Steve Calvert, Sathish Pottavathini, and the Flagship Infra team: Rashmi Jain, Andrew Pottenger, Huiyuan Yang, Trent Willis, and Nick Iaconis, who continued to drive improvement post-launch.

Topics: Open Source Infrastructure