Data Articles

  • photo-of-juan-in-argentina

    After joining LinkedIn Argentina, Juan took an Ireland-based opportunity to build a new EMEA (i.e., Europe, Middle East, and Africa) team of 50+ data scientists and software/infrastructure engineers. Now nearly a decade in, this globetrotter is collaborating with his AI team from his new home in Madrid. A nine-year adventure with LinkedIn I started my journey...

  • high-level-diagram-of-user-migration-and-dataset-deprecation-tool

    Co-authors: Steven Chuang, Qinyu Yue, Aravind Rao, and Srihari Duddukuru Introduction Having recently transitioned LinkedIn’s analytics stack (including 1400+ datasets, 900+ data flows, and 2100+ users) to one based on open source big data technologies, we wanted to give an overview of the journey in a blog post. This move freed us from the limits imposed by...

  • an-illustration-of-the-distributed-tier-merge

    Distributed tier merge: How LinkedIn tackles stragglers in search index build

    September 27, 2021

    Co-authors: Andy Li and Hongbin Wu Indexing plays the key role in modern search engines for fast and accurate information retrieval, and the ability to swiftly build indexes is crucial for LinkedIn to provide up to date information, such as candidates to recruiters, job posts to members, etc. In some instances, such as if a member profile is missing and...

  • graph-of-linkedin-cluster-trends-for-hdfs-space-used-total-name-node-objects-and-yarn-compute-capacity

    Scaling LinkedIn's Hadoop YARN cluster beyond 10,000 nodes

    September 8, 2021

    Co-authors: Keqiu Hu, Jonathan Hung, Haibo Chen, and Sriram Rao At LinkedIn, we use Hadoop as our backbone for big data analytics and...

  • diagram-of-http2-network-client-architecture

    HTTP/2 in infrastructure: Ambry network stack refactoring

    August 24, 2021

    Co-authors: Ze Mao, Matt Wise, Casey Getz, Justin Lin, Ashish Singhai, and Rob Block Introduction Ambry is LinkedIn's scalable...

  • lambda-learner-logo

    Lambda Learner: Nearline learning on data streams

    August 11, 2021

    Co-authors: Kirill Talanine, Jeffrey D. Gee, Rohan Ramanath, Konstantin Salomatin, Gungor Polatkan, Onkar Dalal, and Deepak Kumar...