Infrastructure

LinkedIn’s GraphQL journey for integrations and partnerships: How we accelerated development by 90%

Co-authors: Mimi ChenCalvin Lei, and Amit Yadav

GraphQL logo and LinkedIn logo

Background

LinkedIn’s mission is to connect the world’s professionals to make them more productive and successful. One way we advance this mission is by partnering with other organizations to deliver world class integrations. We are developing a platform-as-a-service (PaaS) that provides exploratory access, insights, and conflation with LinkedIn’s Economic Graph to enable product integrations with strategic partners and customers. Our goal is to build a best-in-class API platform that is easy to use, efficient, and easy for our customers to operate. However, one challenge we found was that crafting and externalizing APIs to fit a specific use case is time consuming. In this blog post, we will walk through our adoption of GraphQL and how it helped speed our API development in support of our customers.  

GraphQL is a query language for data APIs. It is a server-side runtime specification for executing queries using a type system. GraphQL is not tied to any specific database or storage engine. Instead, it is backed by server-side code and data. In the following sections, we will discuss specifics of how GraphQL enables us to significantly accelerate our API development. The key strength of GraphQL is its flexibility in querying, which removes the ties to rigidly defined APIs.

Problem statement: Where we were

Every customer looking to integrate with LinkedIn has a unique set of product concerns, mostly related to data and retrieval patterns. Before the GraphQL model, this resulted in us manifesting a new REST API for each use case. Creating and operating all externalized APIs led to a high cost barrier that only highly-scaled or high-ROI use cases could overcome.

Prior to GraphQL, we employed two major strategies in our APIs to lower our cost of fulfilling different use cases’ requirements:

  1. Define different APIs only when necessary: For use cases with different input scenarios (e.g., search by email, search by schoolId) that lacked reusability, we defined new APIs.
  2. Support projections: For use cases that required different fields to be projected in the response, even if they had the same input, we added projection supports. 

These strategies helped. However, the whole process required a long development cycle, especially for Option #1. When we needed to define a new API or a variant of an existing API, we would have to handle: REST API definition, request’s input processing, and response collecting. After accounting for rigorous technical and business reviews and testing, it would take up to two quarters of development to fulfill a new product’s feature or onboard a new use case.

Long development cycle graphic

Leveraging REST projections made our APIs more reusable but came at a cost. It significantly increased the complexity of monitoring and operations. Splitting the monitoring metrics with different projection sets for the same API is challenging in itself; it also increases the challenge of evolving API products and increases the time to debug in the event of an issue.  

What made the long onboarding time problematic was demand—a fortunate problem to have. Our team had customers lining up and were unable to keep up with demand, let alone make progress towards clearing out the backlog. We needed to scale without simply throwing more engineers at the problem.

Solution: Where we are

At a high-level, GraphQL helps us complete the story.  Users will be able to explore the service capabilities via type system anda new use case will be created via defining a new GraphQL query. As the GraphQL service will automatically work for the new query against its type system, no additional engineering effort will be needed.  In this section, we focus on how we used GraphQL to address our onboarding capacity problems in the GraphQL service part. We also briefly discuss the gaps we see in the current offering of GraphQL and how we extend it for our use-cases.

Principles

When we designed our solution, we used the following guiding principles:

  • Follow industry-supported best GraphQL practices
  • Keep things simple and generic, with no additional manual processes for query execution
  • Alleviate toil with API definition and with response wiring, increasing onboarding velocity and capacity
  • Include full feature coverage for existing use cases, providing a migration path

Gaps

One of the most common use cases within LinkedIn is search. Some examples include finding a person by their attributes, jobs that match certain criteria, and many more. Within LinkedIn, we chose to have our GraphQL API follow the OpenCRUD specification. This specification handles SQL-like queries very well. For search queries, however, it falls short due to the differences in query definition and the response shape.

  • Search semantics are not accurately expressed in type systems (e.g., full-text, typeahead);
  • Search results include search-specific metadata, like matched fields, but relevance features are not well captured in a type system;
  • Search operators, like Lucene’s “should” and “must,” as well as facets, are not well supported in OpenCRUD.

The gaps in the type system for search use cases also make it difficult to execute the search GraphQL queries generically.  

Proposed solution

Now, we will talk about the solution in different aspects: the type system and query engine.

Type system
The type system is the heart of GraphQL. It captures the GraphQL service’s capabilities, like supported retrieval patterns, input conditions, nodes, and edges between nodes. Currently, none of the GraphQL type systems that support search in industries meet all required capabilities like rich filtering or search query specification. We decided to expand the OpenCRUD type system to add search-related type “Search Entity'' to include the rich search response, like facets or matched terms information. As is the case at LinkedIn, we wanted to have different search verticals to provide searchability on different datasets/entities. Meanwhile, with the Search Entity type, we define a rich input structure, to include the search semantics, relevance models, and search-specific operators. An excerpt from our Economic Graph type system is included to demonstrate the modeling for search entities. The following example is the “Search Entity” for the MemberSearch type. 

There are other companies that have created different strategies to define search capabilities in their GraphQL type system. After performing a comparison, we believe our GraphQL type system provides a better user experience, as it follows the GraphQL syntax, provides richer query capabilities, and accepts faceted query features.

Features LinkedIn GitHub Shopify Yelp GitLab
Facet Yes No No No No
Query Format Graph QL-oriented Free-formed Free-formed Graph QL-oriented Graph QL-oriented
Query Syntax Graph QL GitHub Syntax Customized Graph QL Graph QL
Query Capability Multi-conditional Multi-conditional Multi-conditional Single-term Single-term

Query engine
Here is the high level architecture showing how the expanded type system is executed. The search types will be treated as a different execution node, which contains the relation between the other node we plan to fetch data from.

Graph of GraphQL query engine

We are leveraging the graphql-java’s engine to handle the execution concerns in the GraphQL story. With our enhancements and architecture, the engine provides the generic request orchestrator and response stitching functionalities. With that, a new query against the same type system that the engine is serving will automatically work without extra configuration needed. For our purpose of handling search with GraphQL, we will need to expand the engine to allow the downstream call to search via a search data fetcher.  

Having built well-structured schemas (like MemberSearch) for search input and search response types in the type system, we could easily define the generic function for the data fetcher to handle delegation to downstream search verticals. Keywords (e.g., matchTerm, Facet, or relevanceModel) in the type system determine which search vertical a request needs to be routed to and query rewriting. After we filled in the gaps for Search Entity execution, we were able to stitch together the GraphQL engine and our type system to provide the dynamic search capability we needed.

LinkedIn Graph QL Query diagram

Foundation
Along with feature completion, we’re continuing to enhance the foundation support to further improve the whole development process with type system generation tools for search. Because a search type system requires hints to uncover the details about all the searchable fields, facetable fields, and the searched entity, among others, we defined a search index spec: a config file that provides hints for generating the relevance search type system. An automated search type system generation tool will parse the search index spec and generate the type system. We included an example of a search index spec, which we used to further reduce the time to initiate a new GraphQL service with a new type system from three weeks to a couple of days.  

User experience
GraphQL will always return 200, even with partial errors. With that, it is hard to debug GraphQL services. We have adopted and extended our internal query monitoring tool to further analyze the GraphQL error response, so that we can differentiate whether it’s a server-side error, and are using that for service monitoring. Better monitoring and debuggability will help reduce the future maintenance effort.

Improvements on development velocity

With the benefits of GraphQL and our enhanced foundation support, we reduced the total development effort from 2-3 quarters to 2 days for the use case within the type system and eliminated the manual development for new data APIs and their request/response handling. The whole process of onboarding a new use case no longer relies on any engineer’s manual effort. Instead, users can explore the capabilities available from our system, register GraphQL queries for their own use cases, and, when they get approval for the query, they can use it directly.  

In the past few quarters, we’ve onboarded two new use cases into the new system. For both use cases, we reduced the overall onboarding time to 2 days.

Future work: Where we will be

We have been operationalizing the searchable Economic Graph by onboarding new use cases. What’s next on our roadmap is to apply the same concept to our PaaS to enable search over the conflated Economic Graph. We are also planning to implement schema delegation in our ecosystem to boost the efficiency of query resolutions. All of these enhancements are stepping stones towards our goal of building a best-in-class API platform that is easy to use, efficient, and easy to operate. Stay tuned for future updates.    

Acknowledgements

The success of this project would not be possible without support from many people from many teams at LinkedIn. We cannot possibly thank everyone but would like to make a special callout to: Alex SongJingwen MoAreg Nersisyan, and Lin Shen. In addition, we would like to thank Juan Grande and Diego Buthay for helping us with their expertise in the search domain. We also would like to thank Gopal HollaPrakhar SharmaMin Chen , Vickey Yeh, and Arun Ponniah Sethuramalingam, who helped consult on how the type system should be expanded and how we could leverage the existing work provided by our infra teams. Last but not least, we would like to thank Ankit Gupta and Aarthi Jayaram for sponsoring this project.