Data Articles

  • graph-showing-x-axis-dwell-time-y-axis-probability

    Understanding dwell time to improve LinkedIn feed ranking

    May 12, 2020

    Co-authors: Siddharth Dangi, Johnson Jia, Manas Somaiya, and Ying Xuan The LinkedIn feed is the cornerstone of the member experience. It’s where our members post ideas, career news, questions, and jobs in an array of formats, including short text, long-form articles, images, and videos. The Feed AI Team’s mission is to help LinkedIn’s members discover the most...

  • buliding-blocks-of-spark-tf-record

    Spark-TFRecord: Toward full support of TFRecord in Spark

    May 4, 2020

    Co-authors: Jun Shi, Mingzhou Zhou Introduction In the machine learning community, Apache Spark is widely used for data processing due to its efficiency in SQL-style operations, while TensorFlow is one of the most popular frameworks for model training. Although there are some data formats supported by both tools, TFRecord—the data format native to TensorFlow—is...

  • apache-pinot-update

    Introducing Apache Pinot 0.3.0

    April 27, 2020

    Built at LinkedIn, Pinot is an open source, distributed, and scalable OLAP data store that we use as our de-facto near-real-time analytics service. We’ve previously discussed how and why we built Pinot to power a wide spectrum of use cases, including internal business intelligence dashboards to analyze highly-dimensional data and “Who Viewed My Profile” to...

  • metadata-library-updates

    Rapid experimentation through standardization: Typed AI...

    April 15, 2020

    Serving the most relevant information for LinkedIn members in the homepage feed requires a massive effort—hundreds of features are...

  • building-inclusive-products-through-a-b-testing

    Building inclusive products through A/B testing

    March 31, 2020

    Co-authors: Guillaume Saint-Jacques, Amir Sepehri, Nicole Li, and Igor Perisic Introduction Previously on this blog, we’ve shared...

  • schema-management-workflow

    Advanced schema management for Spark applications at scale

    March 25, 2020

    Co-authors: Walaa Eldin Moustafa, Wenye Zhang, Adwait Tumbde, Ratandeep Ratti Introduction Over the years, the popularity of Apache...