Building the New Job Detail Page

Jiuling Wang

Senior Engineering Manager, Observability at Snowflake

June 21, 2016

Editor's note: This blog has been updated.

When we recently launched the new job detail page, we introduced new insights for job seekers to know their “In," meet the team, and get the inside scoop about working at the company. Since launch, the page is driving a double-digit lift in job applications. While it is exciting to see the page's popularity and its rich features, it is also a challenge to transform the design and deliver it to our members. All these changes gave our engineering team an opportunity to adopt the new technologies and rebuild the architecture for the sake of improving both members’ and developers’ experiences. In this blog, we will go into the details of how we built and delivered this page.

Tech stack

At LinkedIn Engineering, we try to use existing technologies as much as we can, which enables us to move fast. In the last few years, the company has developed many new technologies for scaling at LinkedIn. For the new jobs page, we redesigned the frontend architecture using Marionette and Play frameworks, and leveraged existing Rest.li framework and various data storage solution for mid-tier and backend services.

At LinkedIn, we are using Apache Traffic Server (ATS) to proxy to all requests and responses to the site. In other words, when a request to view the job detail page is fired from the browser, ATS will forward the request to our Play application based on the URL pattern.

The key functionality of Play app is fetching dynamic data. The page is organized as a standard header, footer, and multiple modules inside the body. In the Play frontend, we have a component called ModuleFactory, which is responsible for creating all modules shown on the page. Each module will process and get desired data by talking to various downstream Rest.li services. The page template is also defined at Play frontend, for which we are using Scala base page template. The Scala template engine aggregates all the module data with templates and renders them as a barebone HTML.

After the HTML being sent back to ATS, a plugin called Fizzy developed within LinkedIn is responsible for fetching the standard page header and footer, inserting them into the existing HTML, and flushing them to the browser.

In order to provide a fluent user experience for members to navigate across the jobs ecosystem, we decided to build a single page application using Marionette, a backbone framework that simplifies the application code with robust views and architecture solutions. When the browser receives the HTML content from the server side, Marionette application will bootstrap and start to render the page.

LinkedIn Job Detail Page Rendering Workflow

Site speed

In 2015, LinkedIn set a new goal for having every page load in under four seconds for 90 percent of members. The motivation is the clear correlation between site speed and user engagement. Despite adding many new features and providing a more media-rich page, the team was able to render the page in just three seconds in the 90th percentile, over 40 percent faster than the old page. Below, we will share both the technical optimizations we made to the page, and the analytical approach we had from the beginning of the project.

Technical Optimizations

Play Streaming: Streaming is a nice feature in the Play framework that enables the server to start transferring data before knowing the content length of the whole response. The response data is sent in a series of chunks, which means as long as one part of the page is available, it could be streamed to the client. This is not only a win for user experience to see the page earlier, but also key in reducing the page loading time. As long as the header of the page is flushed into client side, the browser will start to parse and download all necessary JavaScript and CSS files. When the HTML document transfer finishes, the related JavaScript and CSS are also almost ready, which shortens the critical path of page load significantly.

Precomputing: A lot of valuable insights in the new design – for example who are your connections or past coworkers currently working at the company you are viewing – are not necessarily unique to every member or time-sensitive. For this reason, we decided to precompute and store key-value pair into read-only Voldemort stores after confirming that the data size of all potential insights is manageable. Since the insights are not time-sensitive, the Hadoop jobs we created to compute insights are scheduled to run every day to keep the data fresh.
Parallelism of downstream call: In addition to Time To First Byte (TTFB), another key factor at server side is the critical path length of getting all data needed. Given the service-oriented architecture of LinkedIn, it is common to talk to multiple backend services before finishing computation data for a product feature. Identifying the best parallelism strategy would improve the overall latency significantly.

Redirect removal: Removing unnecessary redirects is a low hanging fruit for the site speed improvement. The data shows one extra redirect may add around 800ms latency in the 90th percentile of the page loading time. Once we cleaned up legacy code, which points to old URLs, and mapped some URLs in ATS, we observed a significant win for the time to first byte.

Page weight reduction: In the development cycle, we made a tremendous amount of effort to reduce the page weight in all aspects. This include minifying related scripts together into a single script, inlining CSS files, and lazy loading images. We also paid special attention to the mobile page. In order to hit a performance comparable to desktop, we simplified the content on mobile devices to deliver the most important information to members.

Data-driven development

In order to deliver the brand new job detail page, we had to make many feature iterations, which often takes a long time in software engineering. We are exploring possible optimizations to site speed almost every day, so we needed an analytics framework to guide our technical decisions.

In the early stage of the development cycle, the page was not available to external members and we were lacking the direct tracking data to understand the performance of our page. As a result, we used Catchpoint as our initial monitoring platform. On Catchpoint, we could ask machines in different locations to hit our new page periodically, and report key site speed metrics, such as TTFB, document loading time, JavaScript and CSS file size, among others.

LiX is an online experimentation platform developed at LinkedIn that manages the lifecycle of all tests and experiments. We made each optimization change guarded by LiX to ensure it will not break the existing functionality. After verifying that, we ramped the feature to a segment, for example 50 percent of all members. By doing A/B testing using the LiX framework, we got a precise measurement of the improvements we made with each optimization, which would serve as lessons for the future development of other pages.

By monitoring key metrics in Catchpoint and through LiX experiments, and by reporting site speed analysis week over week, we ensured that the page would be on the right track to hit the service-level-agreement and that no new iteration on the page would significantly hurt the performance. This pure data-driven approach helps the team prioritize between developing new feature and optimizing performance of existing page.

Rolling out

Shipping a high-volume page from scratch takes a lot of synchronization. We needed to ensure services at all levels were well-prepared for the potential change of usage pattern.

The first step was to have an estimation of queries per second (QPS) ahead of time, and to sync up with the right teams, like the Site Reliability Engineer team for traffic shift in frontend service, and the application team for downstream Rest.li services. At this point, we might make certain tradeoffs if some appealing features are causing too much QPS to achieve.

We used the LiX experiment framework to make the ramping process as smooth as possible. We began with a target group of certain members, to a small segment of members that began at one percent and gradually increased in increments of 10 until we reached 100 percent of all members. This gave us the chance to fix bugs, collect feedback for iterations, and most importantly, validate business impact and make decisions based on results.

For each ramp, we needed to manually verify that the system was still running properly, which quickly became a nuisance. This motivated us to set up an auto-alert dashboard using InGraph, the framework that visualizes LinkedIn’s site performance. The dashboard contains all key metrics in real-time: QPS to the page, Time to First and Last Byte from frontend services, latency and error rates for generating each module of the page, and outbound QPS to downstream services. Absolute and week-over-week thresholds also fired auto-alert messages.

Thanks to the great partnership from other teams, and the excellent tools LinkedIn built, we were able ship the page comfortably.

Conclusion

As one of the most viewed pages on LinkedIn, the new job detail page brings many more insights to members and is also much faster when compared to the previous page. We are super proud of how far we have come, but this is not the end. There are many more features on the way, for example, finding more recommended reasons to curate the job for you and turning all job pages into one single page application for a seamless experience for our members.

Topics: Open Source Product Design Infrastructure