2021 in review: New frontiers in innovation and scale for our Engineering teams

December 21, 2021


It’s been an incredible journey this year for our LinkedIn Engineering organization, full of new technology milestones, industry-leading innovations, and a continued focus on designing useful, responsible features for our members and customers. After the dramatic shifts that 2020 brought to the ways we live, work, and play, in 2021 we focused on learning to adapt—and thrive—in the new world we’re all living in. With so many exciting moments throughout the year, I wanted to take the time to reflect on some of our highlights. Of course, it would be impossible to fit all the achievements of the year into a single blog post, but it’s my hope that this post will give everyone on our team a sense of pride, and that it will provide a glimpse for the outside world into what we work on every day at LinkedIn.

Milestone moments

As the world's largest professional network, with nearly 800 million members in more than 200 countries and territories worldwide, it’s perhaps unsurprising that we operate at a massive scale—and that presents exciting challenges for our infrastructure engineering teams. Data is essential to how our platform functions; it’s how we help connect members to opportunities, it powers our business solutions across Talent, Sales, Marketing, and Learning, and it helps us create the best-possible user experience through tens of thousands of A/B tests daily. But operating with data at our scale is no easy feat, which is so exciting when we cross new thresholds. 

This year, we reached two important milestones. First, we joined the “exabyte club,” as we now store more than 1 exabyte of total data across all our Hadoop clusters. Second, we scaled our largest Hadoop YARN cluster beyond 10,000 nodes, making it one of the largest (if not the largest) clusters in the world. Achieving these milestones required, among other actions: performance tuning to scale HDFS namespace services, expanding satellite clusters, creating a new Observer node in HDFS, and the development of a new tool called DynoYARN, which we also open sourced.

At the rate we’ve been growing over the past few years, I’m confident that new and exciting infrastructure challenges will continue to present themselves to our teams—but I also know that we have some of the best talent in the world to solve them.

Industry innovations

At LinkedIn, our innovation isn’t only limited to the scale of our operations; we also do cutting-edge work in fields like data science, AI, data infrastructure, and platform engineering, among others. I’m proud that we share many of these innovations via open source, so that others in the industry can also benefit from and improve upon our work. 

A few highlights shared on our LinkedIn Engineering Blog this year, in alphabetical order, included:

  • Bluepill, a parallel iOS simulator test tool we open sourced in 2017, was donated to the Mobile Native Foundation, a Linux Foundation project.

  • We developed a new approach to A/B testing for marketplaces, which we call “budget-split” testing, which has mitigated cannibalization bias and magnified statistical power in our marketplace testing.

  • We open sourced DuaLip, a distributed Linear Program solver to achieve multi-objective optimization on an extreme scale.

  • We open sourced and shared details about FastIngest, a Gobblin-based Kafka-to-HDFS pipeline we developed that improves data ingestion speed and efficiency, as well as query performance.

  • Gobblin, a distributed data integration framework originally developed at LinkedIn and first open sourced in 2014, was announced as a Top-Level Project at the Apache Software Foundation earlier this year. We also created the Data Integration Library (DIL), a library of generic components backed by a multistage architecture to standardize and simplify the connector layer of Gobblin.

  • We open sourced Greykite, a Python library for flexible, fast, and intuitive forecasting.

  • We open sourced Lambda Learner, a library for nearline learning on data streams that enables incremental training for certain ML models.

  • Project Magnet, which provides push-based shuffle in Spark and was created at LinkedIn, was made available as part of Spark’s 3.2 release.

  • TonY, a framework to natively run deep learning jobs on Hadoop that was open sourced by LinkedIn in 2018, joined the LF AI & Data Foundation, under the umbrella of the Linux Foundation.

As we look to create the products and features that will help the world’s professionals be more productive and successful, part of my job as a leader is to make sure that we’re extending that philosophy to our own workforce, too. To that end, we continue to refine the developer experience for our engineers, whether through real-time feedback on our tooling or through improving the remote development process.

We also encourage our engineering teams to be engaged with the wider community. This year, this was displayed through presenting at industry events like KDD, where we had six sessions, the Grace Hopper Celebration, where we had four presentations, and NeurIPS, where we had one session. Our GTM Data Science team was also honored as the Top Analytics Team by the Digital Analytics Association Quantie Awards in 2021. As a sponsor of the 2021 virtual AfroTech conference, we had over 200 employees join the event where they spent time hearing from industry thought leaders, building their networks, and connecting with amazing talent at our booth in the Expo Hall. 

Focus on valuable products and responsible design

It’s exciting to see this industry leadership and innovation from our engineering teams, but I’m equally proud of the fact that our development culture at LinkedIn also focuses on ensuring we’re creating useful, responsible solutions for our members—and not just innovation for innovation’s sake. For instance, as the world has been experiencing the Great Reshuffle, our AI has helped members navigate new opportunities. And thanks to innovations like Pensieve, our embedding feature platform developed in-house, and data standardization, we’ve improved our metric of predicted confirmed hires (PCH) by 34.4% in LinkedIn Talent Solutions through our AI efforts. 

Another example of our work to improve products for members from this past year was our updates to the algorithms powering People You May Know (PYMK). These changes helped make PYMK a more equitable feature by making sure it works well for members regardless of their existing network strength or frequency of platform usage.

Our focus on working to create equal opportunity for equally-qualified members on LinkedIn is one aspect that drives our commitment to responsible design. This year, we shared more insight into our approach to responsible AI in particular, including the six values that we build into our products: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. We provided updates on our work to scale our machine learning fairness toolkit, LiFT, to large AI systems; in the area of transparent and explainable AI, we introduced CrystalCandle (previously called Intellige), a customer-facing model explainer that creates digestible interpretations and insights reflecting the rationale behind model predictions.

In addition to our AI work, our commitment to responsible design also extends to our focus on creating a trusted and safe experience for members and customers on the platform. Over the past year, we’ve shared how we leverage behavior analytic computation for anti-abuse defenses, as well as how we work to create a trusted Jobs ecosystem.

Looking forward to a new year 

As impactful as all of the projects, thought leadership, and activities shared above have been this year, this post only scratches the surface of what our thousands of engineers have accomplished in 2021. At LinkedIn, we recognize that our true strength is our people, and I’m honored to work alongside so many brilliant, empathetic, and fun colleagues every day. 

I’m looking forward to continuing to build off of these solid foundations as we head into a new year, and can’t wait to see the new features, platforms, tools, and experiences that our engineers will create. It’s a privilege to be part of this world-class organization, and to play such a critical role in creating technology and products that can have such a positive impact on the world’s professionals.

We’re always looking for the talent that powers LinkedIn, and if you’re interested in joining the LinkedIn Engineering team, we’re hiring! Our latest openings can be found here.