Data processing workflows must evolve rapidly in order to react to changes in the data they process. This is especially true at a company like LinkedIn and the data they process must evolve rapidly over time.
Dali is a collection of libraries, services, and development tools united by the common goal of providing a logical data access layer for Hadoop and Spark.
At its core, Dali consists of the following components: a catalog to define and evolve physical and virtual datasets, a record-oriented dataset layer that enables applications to read datasets in any application environment, and a collection of development tools allowing data to be effectively treated like code by integrating with LinkedIn’s existing software management stack.