Client Zero Protocol: Taking intelligent risks to go beyond the traditional SDLC control framework

Alex Sartel

Senior Staff Risk & Compliance Engineer @ LinkedIn | PADI Scuba Diver Instructor

April 15, 2022

Introduction

Traditional IT processes supporting change management rely on a clear separation of the environments involved in the software development lifecycle (SDLC). Developments and Unit Tests are performed in the “Development” environments, Quality Assurance and User Tests are conducted in the User Acceptance Testing (UAT) environments, and the “Production” environment is the sacred place where only real-life transactions and activities are performed.

While this approach efficiently mitigates the IT risks related to the change management process, it’s based on how companies historically maintained and operated their information systems (fewer releases, central Enterprise Resource Planning system (ERP), fewer interfaced applications, etc.). Today, companies are moving toward more fluid and granular models, based on interconnected services versus “end-to-end” applications, making testing in UAT environments less relevant in terms of risk coverage and operability. Though the traditional waterfall model for development lifecycle is no longer being adopted, we see the emergence of DevOps models, pushing the cannons of traditional IT General Control (ITGC) frameworks.

In this blog, we’ll discuss LinkedIn’s approach to Client Zero Protocol, which we call “testing in production.” Because Client One would be the first customer using a feature, Client Zero is the “test” client imitating the flow in conditions similar to production before releasing it to the first client.

In line with our vision to create economic opportunity for every member of the global workforce, we continue to explore solutions that allow our systems to adapt to the constant changes to the needs of our members and customers. With Client Zero Protocol, we focus on the compliance aspects and details on the organizational and technical controls regarding how regulatory requirements can be addressed while protecting our members' and customers’ trust.

Going beyond the traditional control frameworks

Anyone who has ever interacted with the compliance aspects of the software development lifecycle has designed, owned, or operated “Change Management” IT General Controls to address the following risks:

Unauthorized changes are implemented into the production environment which do not respond to a legitimate need and alter the expected behavior of the system.
Unexpected behaviors and processing errors may be introduced by changes that were not thoroughly tested prior to being implemented into the production environment.
Unauthorized access to production data and functions are granted and may be used to process inappropriate transactions in production; this includes but is not limited to lack of segregation of duties between the developers and the application users, and violation of confidentiality and privacy of the production data.

Traditionally, these risks are addressed by a series of organizational, preventative, and detective controls:

Production environments are segregated from non-production environments in terms of code, data, and access.
Changes are authorized and tested by authorized people prior to being promoted to production.
Only production administrators (who are not developers) can promote changes to production.
Mechanisms are in place (either system enforced or detective review) to provide assurance that only duly approved and tested changes are implemented in production.
Access to the production environment is in line with users’ job and business function; this also implies that segregation of duties between the developers and the application users is implemented.

Instead of trying to adapt our procedures and processes to fit the traditional control framework, we analyzed the foundations of the traditional controls to isolate attributes that, if implemented, would effectively address the risks.

The term “production” is at the center of the risks we want to address. Our first step was to better define this term. The “production environment” is a physical/logical entity. It is a combination of services, traffic, data, compute resources, and network zones that power the experience for real users. Whereas “in production” is a state. It is a set of features, APIs, and data that are effectively exposed to members and customers. Systems in the production state can either be all the systems in the production environment or a subset of them. It is theoretically possible to have a cohabitation of systems in different states in the same environment while being able to differentiate systems in production state from those systems not in production state.

Defining these terms helps to understand that the risks mentioned above actually relate to ensuring the sanctity of the systems in production state. Therefore, segregating the production environments from the non-production environments is, consequently, a means, not a goal.

Figure 1 - Traditional approach v. Client Zero Protocol

The new objective is to achieve the segregation of systems in production state versus systems not in production state while in the same environment, instead of focusing on segregation of environments. This can be done through the implementation of the following primary attributes:

Segregation of the code – the code used for systems not in production state may not be used by systems in production state.
Segregation of the data – data generated and processed by systems not in production state do not interfere with data generated and processed by systems in production state.
Segregation of users – access for systems and data not in production state may not be used to access system and data in production state. Reciprocally, access for systems and data in production state may not be used to access systems and data not in production state.

In addition to the primary attributes listed above, which would effectively preserve the sanctity of the system in production state, we believe it is important to limit the possibility of honest mistakes and add redundancy in addressing the risks through the implementation of the following secondary attributes:

Segregation of UXs – UXs should clearly state when a transaction is processed through systems not in production state, versus transactions processed through systems in production state.
Monitoring and impact assessment – Transactions processed through systems not in production state should be monitored and analyzed to ensure they don’t impact transactions processed through systems in production state.

Guidelines for implementation

To apply the protocol attributes described above, we considered two approaches. A short-term solution that relies on the implementation of additional layers of controls, and a long-term solution that relies on a refactoring of the systems to natively embed the protocol attributes. While the short-term solution is simpler and quicker to implement, the long-term solution will offer more comprehensive and efficient risk coverage. This framework is generic and not specific or limited to any one application. The feasibility of implementing the framework for applicable systems will need to be determined based on each system in the environment.

Short-term solution

In the short-term protocol, Client Zero admins, which are users involved in the creation and processing of transactions and entities related to the Client Zero, will use existing permissions and the systems will not be able to natively segregate Client Zero transactions from real-life transactions. In this protocol, we are relying on an additional layer of controls to effectively implement the protocol attributes. From an IT General Controls perspective, this proposition allows us to carve out code under consideration from the general change population through code flagging. However, as Client Zero admins are still included in the general user population, specific user monitoring controls (preventative and detective) need to be implemented to detect whether Client Zero admins interacted with production data and correct this data, as needed. Rigorous downstream monitoring activities are implemented to ensure appropriate accounting of Client Zero transactions in the production data, which minimizes the risk of a misstatement.

While this solution can be quickly implemented and doesn’t require any major changes to the systems, there is a heavy reliance on manual detective controls, the scope in terms of acceptable activities is limited, and the potential for mistakes is higher so projects subject to Client Zero protocol must be clearly bounded in terms of timeframe and scope and precise and frequent activity monitoring is crucial. The short-term solution also can’t guarantee privacy and confidentiality of production data.

Since there is no native segregation of data in the short-term solution, the Client Zero protocol will strictly be limited to production configuration, traffic ramp, permission, cross-domain connection, or other operation-related configurations. The features subject to Client Zero protocol will have already been thoroughly tested in non-production environments before being promoted to the production environment.

Addressing the protocol attributes
In the short-term solution, the protocol attributes are addressed through a combination of controls and tooling.

Segregation of the code (primary attribute)
The code subject to Client Zero protocol is logically isolated from the code supporting systems in production state and is only executed for Client Zero transactions. This is achieved using “flagging” tools. The flagged code is only executed for predetermined transactions based on specified criteria. These criteria are defined and managed by the individual project teams based on the protocol use cases and are reported in the Client Zero protocol documentation of the projects, which are then assessed for appropriateness before proceeding with the protocol. Client Zero admins ensure that:

The code under Client Zero protocol has been extensively tested in non-production environments.
The coverage of the code under Client Zero protocol through flag is complete.
The flags for the code under Client Zero protocol are effectively configured.

Segregation of users (primary attribute)
Client Zero admins are users whose job function does not require them to access, create, process, or delete data from systems in production state. In the absence of native segregation, controls are implemented to carve out Client Zero admins from the general population of users. This includes a segregation in the processing of user requests and a review of Client Zero admin activities that are detailed and frequent enough to capture and quickly address instances where Client Zero admins interacted with production data.

Note 1: Although Client Zero admins are separated from the general population of users, approval of their access is performed by the functional owner of the system (like regular users).

Note 2: Client Zero admins are not granted with System Administrator access.

Segregation of Data (primary attribute)
The short-term solution does not include a native segregation of the Client Zero transaction from the production data. Controls are implemented to review the Client Zero transaction to ensure their impacts are quickly identified, tracked, and nullified. This tracking is based on documentation produced to support the Client Zero protocol for each specific project. Documentation includes a detailed script for the Client Zero protocol, complete mapping of transactions, entities, or related objects impacted by the Client Zero protocol (including in downstream systems, if applicable).

Segregation of UXs (secondary attribute)
The short-term solution does not include a native segregation of user interface between interaction with Client Zero transaction and interaction with production data. Like the Segregation of Data attribute, controls are implemented to review the Client Zero transaction to ensure their impacts are also quickly identified, tracked, and nullified.

Monitoring and Impact assessment (secondary attribute)
The short-term solution does not include a native segregation of the Client Zero transaction from the production data. In this context, it is crucial that Client Zero transactions and their dependencies are identifiable and reportable through a pre-identified list of queries/reports. After Client Zero protocol is completed, Client Zero transactions are canceled/nullified. These transactions do not have any impact on the financial statements.

Long-term solution

In the long-term protocol, the code subject to Client Zero protocol, the data created and processed through Client Zero transactions, and the users involved in the execution of the protocol (the Client Zero Admins) are natively segregated from production code, data, and user bases through system enforced mechanisms. From an IT General Controls perspective, this proposition allows us to carve out code subject to the protocol from the general change population through code flagging and to identify and segregate Client Zero transactions at the data level. Client Zero admins also are carved out from the general user population and are not able to interfere with production data. High-level downstream monitoring activities are implemented to ensure appropriate accounting of Client Zero transactions in the production data, further minimizing the risk of a misstatement.

With this long-term solution, which relies on native system features, privacy and confidentiality of production data are supported, mistakes are less likely to happen, and there is a larger scope for acceptable activities (assuming unit testing was successfully completed in non-production environments). However, it’s important to note that major changes to the systems are required to natively support the Client Zero Protocol and there is a larger implementation effort.

The implementation of the Client Zero Protocol long-term strategy could require significant implementation efforts. It is possible to use an iterative approach for implementing the protocol. Assuming the Client Zero Protocol short-term strategy is already implemented, the organization can progressively implement long-term features that could be leveraged to make the implementation more efficient and ultimately would replace short-term strategy related controls.

Addressing the protocol attributes
In the long-term solution, the protocol attributes are addressed through the segregation of the systems subject to Client Zero protocol from the systems in production state.

Segregation of the code (primary attribute)
The code subject to Client Zero protocol is logically isolated from the code supporting systems in production state and is only executed for Client Zero transactions. This is achieved using “flagging” tools. The flagged code is only executed for predetermined transactions based on specified criteria. These criteria are defined, managed, and periodically reassessed by a central team in charge of engineering processes and tooling, with support from the application teams. The Client Zero admins ensure that:

The code under Client Zero protocol has gone through unit testing in non-production environments.
The coverage of the code under Client Zero protocol through flag is complete.
The flags for the code under Client Zero protocol are effectively configured.

Note: Other techniques can be considered for code segregation, such as the creation of ghost/dark instances where no production traffic is routed. Such techniques could be privileged when a fine grind segregation of code is not the most optimal approach or is altogether impossible.

Segregation of Data (primary attribute)
The long-term solution involves a native segregation of the Client Zero transaction from the production data. To achieve the segregation of data, the data schemas are amended to add a specific immutable “client_zero” field. This additional field is available for all schemas in the systems subject to the Client Zero protocol.

The systems’ logic is further updated to flag Client Zero transactions, related entities, or objects as client_zero = TRUE, at the time of creation and processing.

Segregation of users (primary attribute)
The Client Zero admins are users whose job function does not require them to access, create, process, or delete data from systems in production state. To carve out the Client Zero admins from the general population of users, the systems subject to Client Zero protocol offer Client Zero specific access, which mirrors the production access with the difference that users granted with Client Zero access are only able to interact (create, view, process) with transactions and objects flagged client_zero = TRUE. Likewise, users granted with production access are only able to interact with transactions and objects flagged client_zero = FALSE.

To uphold a complete segregation of users, it is discouraged for a user to have both production and Client Zero access.

Note: Since the access used for Client Zero admins are a distinctive subset of the system access, it is appropriate to leverage the existing Access Management control environment to support access provisioning, deprovisioning, and review.

Segregation of UXs (secondary attribute)
The long-term solution includes a native segregation of user interface between interaction with Client Zero transactions and interaction with production data. Like the Segregation of Users attribute, the Segregation of UX is based on the client_zero flag. The systems read the client_zero flag and automatically and explicitly alert the users if they are interacting (creating, displaying, processing) with a Client Zero transaction or object.

The alerting mechanisms in the UX could consist of a “Client Zero protocol” banner or any other viewable UI artifact.

Monitoring and Impact assessment (secondary attribute)
The long-term solution implies a native segregation of user interface between interaction with Client Zero transactions and interaction with production data. This native segregation allows us to clearly identify and report Client Zero transactions, entities, and objects. Canned reports are created to support the reporting of data related to Client Zero protocol leveraging the client_zero flag. The canned reports also allow us to identify objects created by Client Zero transactions in downstream systems.

Although Client Zero transactions, within the perimeter of Client Zero-ready systems, are not required to be canceled/nullified after Client Zero protocol is completed since these transactions do not, by design, have any impact on the financial statements, a failsafe protocol is implemented to systematically cancel/nullify all data flagged client_zero = TRUE, should they impact systems outside of the perimeter of Client Zero-ready systems.

Note: If downstream systems are not ready for Client Zero protocol, then Client Zero transactions in those systems are systematically canceled/nullified.

System readiness

For a system to be ready for Client Zero protocol, the following features are required:

The system allows to segregate the code under Client Zero protocol from the code in production.
The system natively flags the data generated under Client Zero protocol at the database level through the implementation of the client_zero field in all the schemas.
The system natively segregates access to the Client Zero transactions from the production data through the implementation of Client Zero specific access.
The system natively segregates the transactions and informs the user of its nature through the implementation of UI artifacts.
Canned reports and programs will be available to identify, report, and cancel all data generated under Client Zero protocol as needed.
The system documentation is updated to reflect its readiness for Client Zero Protocol. This includes updated, approved, and tested functional and technical specifications, interfaces designs, reports, roles, and authorizations.

Conclusion

Ensuring the systems in production operate as intended is the primary objective of any IT control framework. To achieve this objective, companies often choose the traditional path of segregating the production environments from the non-production environments. An alternative path consists of understanding the nuance between a system in production state and a system deployed to a production environment and focusing the efforts on segregating the systems in production states from the systems not in production state, regardless of the environment where they are deployed.

At LinkedIn, we believe that enabling the latter approach will be key in scaling up companies’ IT environments in a context where complexity, deployments, and interconnections between systems are continuously and exponentially increasing. Investing in systems that support Client Zero protocol and tooling will allow for more effective, comprehensive, and realistic testing scenarios, which in turn will result in a better member experience.

Acknowledgments

It takes a village to build a risk framework that impacts so many teams across LinkedIn and beyond. We’re thankful to teams across Engineering, Finance, and Compliance organizations that we have worked closely with.

A big note of thanks to (in alphabetical order):

Engineering Systems and Compliance team: Malai Lakshmanan, Melissa Lam, Priya Bharbhari, Salona Uthappa.

LinkedIn Engineering partners: Ashima Atul, Brook Molla, Gaurav Chadha Jacek Suliga, Jared Green, Michael Zhang, Nishant Garg, Sajid Topiwala, Scott Holmes, Siddharth Agarwal, and Venkat Ganesan, for helping us bring a pragmatic view, representing the engineering teams across the organization, as well as for their enthusiastic and continuous inputs, and contributions towards the Client Zero Protocol.

LinkedIn financial compliance partners: Can Cao, Hang Lee, Katherine Toguchi, Lisa Sato, Matthew Diwata, Thibaut Smouts, and Thomas Haley, for being the voice of the financial compliance functions, and for their insightful feedback and contributions.

Leadership: We would also like to thank Sabry Tozin from the leadership team, and executive sponsor, for the continued support and investment in LinkedIn compliance programs.

Topics: A/B Testing/Experimentation Code

Culture

Career Stories: Learning and growing through mentorship and co...

Sep 13, 2023
Viral spam content detection at LinkedIn

Apr 20, 2023
Infrastructure

Accelerating Code Delivery By 97% With Yarn Workspaces

Dec 15, 2022