Data Articles

  • diagram-of-darwin-functionality

    DARWIN: Data Science and Artificial Intelligence Workbench at LinkedIn

    January 28, 2022

    Co-authors: Varun Saxena, Harikumar Velayutham, and Balamurugan Gangadharan LinkedIn is the largest global professional network and generates massive amounts of high-quality data. Our data infrastructure scales to store exabytes of data; data analysts, data scientists, and AI engineers then use this data to power several LinkedIn products and the platform as a...

  • photo-of-juan-in-argentina

    After joining LinkedIn Argentina, Juan took an Ireland-based opportunity to build a new EMEA (i.e., Europe, Middle East, and Africa) team of 50+ data scientists and software/infrastructure engineers. Now nearly a decade in, this globetrotter is collaborating with his AI team from his new home in Madrid. A nine-year adventure with LinkedIn I started my journey...

  • high-level-diagram-of-user-migration-and-dataset-deprecation-tool

    Co-authors: Steven Chuang, Qinyu Yue, Aravind Rao, and Srihari Duddukuru Introduction Having recently transitioned LinkedIn’s analytics stack (including 1400+ datasets, 900+ data flows, and 2100+ users) to one based on open source big data technologies, we wanted to give an overview of the journey in a blog post. This move freed us from the limits imposed by...

  • an-illustration-of-the-distributed-tier-merge

    Distributed tier merge: How LinkedIn tackles stragglers in ...

    September 27, 2021

    Co-authors: Andy Li and Hongbin Wu Indexing plays the key role in modern search engines for fast and accurate information retrieval,...

  • graph-of-linkedin-cluster-trends-for-hdfs-space-used-total-name-node-objects-and-yarn-compute-capacity

    Scaling LinkedIn's Hadoop YARN cluster beyond 10,000 nodes

    September 8, 2021

    Co-authors: Keqiu Hu, Jonathan Hung, Haibo Chen, and Sriram Rao At LinkedIn, we use Hadoop as our backbone for big data analytics and...

  • diagram-of-http2-network-client-architecture

    HTTP/2 in infrastructure: Ambry network stack refactoring

    August 24, 2021

    Co-authors: Ze Mao, Matt Wise, Casey Getz, Justin Lin, Ashish Singhai, and Rob Block Introduction Ambry is LinkedIn's scalable...