Data Articles

  • apache-pinot-update

    Introducing Apache Pinot 0.3.0

    April 27, 2020

    Built at LinkedIn, Pinot is an open source, distributed, and scalable OLAP data store that we use as our de-facto near-real-time analytics service. We’ve previously discussed how and why we built Pinot to power a wide spectrum of use cases, including internal business intelligence dashboards to analyze highly-dimensional data and “Who Viewed My Profile” to...

  • metadata-library-updates

    Rapid experimentation through standardization: Typed AI features for LinkedIn’s feed

    April 15, 2020

    Serving the most relevant information for LinkedIn members in the homepage feed requires a massive effort—hundreds of features are used to personalize content for hundreds of millions of members. For each homepage visit, our machine learning models have to find and surface the best activity across a member's whole network, and they have to source that content...

  • building-inclusive-products-through-a-b-testing

    Building inclusive products through A/B testing

    March 31, 2020

    Co-authors: Guillaume Saint-Jacques, Amir Sepehri, Nicole Li, and Igor Perisic Introduction Previously on this blog, we’ve shared information on best practices in data science, particularly in areas such as A/B testing. We’ve also discussed the importance of ethics in fields such as data science, early implementations of “fairness by design” principles in our...

  • schema-management-workflow

    Advanced schema management for Spark applications at scale

    March 25, 2020

    Co-authors: Walaa Eldin Moustafa, Wenye Zhang, Adwait Tumbde, Ratandeep Ratti Introduction Over the years, the popularity of Apache...

  • data-sentinel-logo

    Data Sentinel: Automating data validation

    March 10, 2020

    Co-authors: Arun Swami, Sriram Vasudevan, Sailesh Mittal, Jiefu Zheng, Joojay Huyn, Audrey Alpizar, Changling Huang, Maneesh Varshney,...

  • diagram-of-espressos-architecture

    How we improved latency through projection in Espresso

    March 5, 2020

    Co-authors: Xiang Zhang and Chuck Jerian Espresso is LinkedIn’s document-oriented, highly available, and timeline-consistent...