Engineers at LinkedIn create and integrate some amazing technologies every day. Bringing in a wide range of experiences when hiring also brings in many new and different technologies, allowing LinkedIn engineers to continually test which tools and technologies can improve the site and their productivity.
- Zoie is a high-throughput real-time search indexing system based on the Lucene search library. The real-time indexing provides users the ability to search almost instantly for content that is updated in real time. The index is persisted on disk and if there is a system crash, the index does not need to be completely rebuilt.
- Bobo is a high-performance faceted search library based on Lucene. With this library, you can easily transform a Lucene text search application into one that supports faceted search. Bobo provides FacetHandlers that can take care of most typical facet types, and can be easily extended to handle more complex/customized facet types.
- Hadoop is the backbone of LinkedIn's offline data cycle used to compute everything from collaborative filtering recommendations, people you may know, and skill pages. LinkedIn has a large investment in Hadoop, with thousands of machines, and map/reduce jobs and pig scripts written by engineers throughout the company. We have a custom workflow scheduler used to manage large production jobs consisting of dozens of map reduce steps.
Storage at Scale
- Voldemort is one of the many LinkedIn home-grown projects: a distributed key-value storage system. Voldemort supports low latency, horizontally-scalable data access with a simple API. We’ve been running it in production at LinkedIn since 2008, and it’s currently distributed across data centers serving tens of thousands of requests per second. Voldemort was one of the first open source (nosql) systems.
- In Espresso, we’ve created a highly available, flexible, elastic, scalable, and operable data service. Unlike a key-value store, Espresso provides timeline-consistency, and transactional guarantees within partitions, supporting many applications that exhibit hierarchical and bipartite graph data patterns. It supports document-style schemas with typed fields, secondary indexes, and multiple serialization formats such as JSON, Avro, and protocol buffers, and easy access through a simple RESTful API. Schemas can be updated on the fly with a simple service call. Infrastructure concerns such as adding capacity, partitioning, replication, tuning, caching, and failover are all handled by the service, freeing you to focus on the data model, schema, and business logic.