Data Articles

  • graph-of-linkedin-cluster-trends-for-hdfs-space-used-total-name-node-objects-and-yarn-compute-capacity

    Scaling LinkedIn's Hadoop YARN cluster beyond 10,000 nodes

    September 8, 2021

    Co-authors: Keqiu Hu, Jonathan Hung, Haibo Chen, and Sriram Rao At LinkedIn, we use Hadoop as our backbone for big data analytics and machine learning. With an exponentially growing data volume, and the company heavily investing in machine learning and data science, we have been doubling our cluster size year over year to match the compute workload growth. Our...

  • diagram-of-http2-network-client-architecture

    HTTP/2 in infrastructure: Ambry network stack refactoring

    August 24, 2021

    Co-authors: Ze Mao, Matt Wise, Casey Getz, Justin Lin, Ashish Singhai, and Rob Block Introduction Ambry is LinkedIn's scalable geo-distributed object store. Developed in-house and open sourced in 2016, Ambry stores tens of petabytes of data. At LinkedIn, Ambry is used to store objects like photos, videos, and resume uploads, as well as internal binary data....

  • lambda-learner-logo

    Lambda Learner: Nearline learning on data streams

    August 11, 2021

    Co-authors: Kirill Talanine, Jeffrey D. Gee, Rohan Ramanath, Konstantin Salomatin, Gungor Polatkan, Onkar Dalal, and Deepak Kumar Introduction A common challenge for production machine learning systems is reacting to change. The world can change quickly, particularly on a social network. This can range from sweeping changes at the scale of the whole economy...

  • gif-showing-new-recruiter-and-jobs-experience

    New Recruiter & Jobs: The largest enterprise data migration...

    July 23, 2021

    Co-authors: Xiaoyang Gu, Xie Lu, and Xiaoguang Wang Introduction In August 2019, we introduced our members and customers to the idea...

  • current-architecture-of-the-daily-executive-dashboard-pipeline

    From daily dashboards to enterprise grade data pipelines

    July 20, 2021

    Within a matter of hours of each day beginning, we ingest tens of billions of records from online sources to HDFS, aggregated across...

  • a-graph-showing-keyword-search-query-p95-latency-with-increasing-qps-for-different-workloads

    Text analytics on LinkedIn Talent Insights using Apache...

    June 16, 2021

    Co-authors: Siddharth Teotia and Tim Santos Introduction LinkedIn Talent Insights (LTI) is a platform that helps organizations...