Open Source Articles

  • image-of-a-feather

    Open sourcing Feathr – LinkedIn’s feature store for productive machine learning

    April 12, 2022

    We are open sourcing Feathr – the feature store we built to simplify machine learning (ML) feature management and improve developer productivity. At LinkedIn, dozens of applications use Feathr to define features, compute them for training, deploy them in production, and share them across teams. With Feathr, users reported significantly reduced time required to...

  • image-of-the-overview-of-brooklin

    Load-balanced Brooklin Mirror Maker: Replicating large-scale Kafka clusters at LinkedIn

    April 11, 2022

    At LinkedIn, Apache Kafka is used heavily to store all kinds of data, such as member activity, log storage, metrics storage, and a multitude of inter-service messaging. LinkedIn maintains multiple data centers with multiple Kafka clusters per data center, each of which contains an independent set of data. Mirroring (i.e., replicating) Kafka topics across the...

  • graph-of-fast-tree-shap-version-comparison

    FastTreeSHAP: Accelerating SHAP value computation for trees

    March 15, 2022

    Co-authors: Jilei Yang, Humberto Gonzalez, Parvez Ahammad In this blog post, we introduce and announce the open sourcing of the FastTreeSHAP package, a Python package based on the paper Fast TreeSHAP: Accelerating SHAP Value Computation for Trees (presented at the NeurIPS2021 XAI4Debugging Workshop). FastTreeSHAP enables an efficient interpretation of tree-based...

  • an-example-for-using-the-member-connection-graph-for-a-job-recommendation-task

    Performance-Adaptive Sampling Strategy (PASS) for GNNs:...

    March 7, 2022

    Co-authors: Jaewon Yang, Minji Yoon, Sufeng Niu, Dash Shi, and Qi He Graphs are a universal way to represent relationships among...

  • title-card

    Project Magnet, providing push-based shuffle, now available...

    October 20, 2021

    Co-authors: Venkata Krishnan Sowrirajan and Min Shen We are excited to announce that push-based shuffle (codenamed Project Magnet) is...

  • graph-of-linkedin-cluster-trends-for-hdfs-space-used-total-name-node-objects-and-yarn-compute-capacity

    Scaling LinkedIn's Hadoop YARN cluster beyond 10,000 nodes

    September 8, 2021

    Co-authors: Keqiu Hu, Jonathan Hung, Haibo Chen, and Sriram Rao At LinkedIn, we use Hadoop as our backbone for big data analytics and...