Espresso is LinkedIn's online, distributed, fault-tolerant NoSQL database that currently powers mission-critical LinkedIn applications including member profiles, InMail (LinkedIn's member-to-member messaging system), and much more. Espresso has a large production footprint at LinkedIn and hosts some of the most heavily accessed and valuable datasets at LinkedIn.
Like other document databases, Espresso scales better than RDBMS systems and is feature-rich when compared to key value stores. Unlike other document databases, Espresso solves problems that are unique to LinkedIn, such as variable database partition sizes and richer compliance support.
Our three focus areas are productivity, elasticity, and manageability. Our investments in these areas have a clear impact to the company’s bottomline with massive cost savings and 100s of engineer-years worth of time saved.
Current/future Espresso team projects
The Espresso team is focused on a variety of projects as part of the evolution of the storage system. These range from distributed caching on top of the storage system, pushing global index updates in near real-time, enforcing uniqueness in an active-active setup, push-button data migration for applications, adding a MyRocks-based storage engine, and more.
Another key area of focus for the team is elasticity. Specifically, the ability to auto-scale Espresso in an efficient manner. We are working to enable repartitioning (scaling 2x-8x), variable partitioning (allowing partitions to have dynamically adjustable sizes), and range partitioning (allowing a collection of records to split from a partition)—all while the database is taking traffic.
Another key focus area is manageability. In our case, this means operating the site with minimal human interaction. Current/future projects in this area include leveraging machine learning to automatically rebalance databases across clusters, reducing the time-to-recover for production incidents, deployments-at-scale, leveraging advanced triangulation techniques, and more.