Introducing SenseiDB 1.0: an open-source, distributed, realtime, semi-structured database

John Wang

January 26, 2012

I'm excited to announce that we have released version 1.0.0 of SenseiDB to the open-source community. Sensei is a distributed, elastic, realtime, and semi-structured database.

Check out SenseiDB on Github!

Read on to learn more about what Sensei does, the architecture behind it, and the project's future direction.

What is Sensei?

Sensei is a distributed data system that was built to support many product initiatives at LinkedIn, including the real-time faceted search in Signal and the news feed and tabs on the Homepage. It is the foundation of LinkedIn's search and data infrastructure.

Sensei is both a search engine and a database. It is designed to query and navigate through documents that consist of (a) unstructured text and (b) well-formed and structured metadata.

Features

Some features and differentiators of Sensei:

Ability to consume high insert/updates while maintaining high query performance.
Support for complex queries via a query language (BQL) and a REST/JSON api.
Streaming updates from different Gateways such as JDBC, JMS, and Kafka.
Bootstrapping from Hadoop, e.g. Map-Reduce job to batch build index and push to Sensei clusters.
Ability plug-in custom and complex faceting logic such as the social graph.

Architecture

Inserts

Unlike many other data-systems, Sensei consumes data from an ordered and versioned data stream that we call a gateway. Within LinkedIn, some of the data streams consumed by Sensei include Kafka and Databus (a technology we use to stream data from a database).

Sensei relies on the external data stream for atomicity and isolation guarantees; in a way, the commit log is externalized. This design allows us to optimize for update rate while providing eventual consistency across replications without needing a quorum.

For more details, see the architecture overview page and the clustering page.

Queries

Sensei's execution engine is optimized for performance on very large datasets and supports a rich query feature set:

get/getAll, e.g. a key-value retrieval
full-text search
structured, sql-like selects
aggregation, e.g. facet counting and group-by

Along with a REST/JSON API, Sensei supports a SQL-like query language called BQL. Here's an example BQL query you can run against Sensei:

What Sensei is NOT

Some features Sensei does not support in comparison to other data-systems:

Sensei is not relational. Like many other NoSQL systems, data is de-normalized and JOIN operations are not supported.
Sensei is not transactional. We provide durability and eventual consistency guarantees but we do not support a full transactional insert model (e.g. roll-back)

Next play

Some future work we have in mind for Sensei:

Relevance toolkit
Support for aggregation and field collapsing
Support for nested document structures
Dynamic Schema
Online data-rebalancing
Data import/export
Inter-cluster Map-Reduce support

Get involved!

To learn more and help drive Sensei forward, check-out the following:

SenseiDB project page
Source code
LinkedIn group
Mailing list
IRC: irc.webchat.org, channel #senseidb