Data Articles

  • building-inclusive-products-through-a-b-testing

    Building inclusive products through A/B testing

    March 31, 2020

    Co-authors: Guillaume Saint-Jacques, Amir Sepehri, Nicole Li, and Igor Perisic Introduction Previously on this blog, we’ve shared information on best practices in data science, particularly in areas such as A/B testing. We’ve also discussed the importance of ethics in fields such as data science, early implementations of “fairness by design” principles in our...

  • schema-management-workflow

    Advanced schema management for Spark applications at scale

    March 25, 2020

    Co-authors: Walaa Eldin Moustafa, Wenye Zhang, Adwait Tumbde, Ratandeep Ratti Introduction Over the years, the popularity of Apache Spark at LinkedIn has grown, and users today continue to leverage its unique features for business-critical tasks. Apache Spark allows users to consume datasets using powerful, yet easy-to-use APIs such as the Dataset interface. The...

  • data-sentinel-logo

    Data Sentinel: Automating data validation

    March 10, 2020

    Co-authors: Arun Swami, Sriram Vasudevan, Sailesh Mittal, Jiefu Zheng, Joojay Huyn, Audrey Alpizar, Changling Huang, Maneesh Varshney, Adrian Fernandez Data’s value is best realized when prepared and treated correctly. However, when you’re working with data at an extensive scale, it’s not as easy to make sure that every data set has been cleaned and validated....

  • diagram-of-espressos-architecture

    How we improved latency through projection in Espresso

    March 5, 2020

    Co-authors: Xiang Zhang and Chuck Jerian Espresso is LinkedIn’s document-oriented, highly available, and timeline-consistent...

  • analyzing-a-difference-summary

    Analyzing anomalies with ThirdEye

    February 20, 2020

    Co-authors: Yen-Jung Chang, Yang Yang, Xiaohui Sun, and Tie Wang At LinkedIn, ThirdEye is the backbone of our monitoring toolkit. We...

  • datahub-logo

    Open sourcing DataHub: LinkedIn’s metadata search and...

    February 18, 2020

    Co-authors: Kerem Sahin, Mars Lan, and Shirshanka Das Finding the right data quickly is critical for any company that relies on big...