Modern web scale companies have outgrown traditional analytical products due to challenges in analyzing massive scale, fast moving datasets at real-time latencies. Before Pinot, the analytics products at LinkedIn were built using generic storage systems like Oracle (RDBMS) and MySQL, but these systems are not specialized for OLAP needs and the data volume at LinkedIn was growing exponentially in both breadth and depth. These, combined with widening needs across the company, required a single leverage-able system, which was the impetus for building Pinot.
Pinot, a real-time distributed OLAP data store built at LinkedIn, delivers scalable real time analytics with low latency. It can ingest data from batch data sources (such as Hadoop and flat files) as well as streaming sources (such as Kafka). It’s also optimized for analytical use cases on immutable append-only data and offers data freshness that is on the order of a few seconds. It’s horizontally scalable and scales to hundreds of tables, machines and thousands of queries per second in production.
Key Features include:
- A column-oriented database with various compression schemes such as Run Length, Fixed Bit Length
- Pluggable indexing technologies - Sorted Index, Bitmap Index, Inverted Index
- Ability to optimize query/execution plan based on query and segment metadata
- Near real time ingestion from Kafka and batch ingestion from Hadoop
- SQL like language that supports selection aggregation, filtering, group by, order by, distinct queries on fact data.
Pinot powers some of LinkedIn's more recognizable experiences such as Who Viewed My Profile, Job, Publisher Analytics, and many more. In addition to that, Pinot also powers LinkedIn's internal reporting platform, helping hundreds of analysts and product managers make data driven decisions.