Open sourcing Kube2Hadoop: Secure access to HDFS from Kubernetes
June 10, 2020
LinkedIn AI has been traditionally Hadoop/YARN based, and we operate one of the world’s largest Hadoop data lakes, with over 4,500 users and 500PB of data. In the last few years, Kubernetes has also become very popular at LinkedIn for Artificial Intelligence (AI) workloads. Adoption at the company started as a proof of concept for Jupyter notebooks, and it has now become a key piece of our model training and model serving infrastructure.
By default, there is a gap between the security model of Kubernetes and Hadoop. Specifically, Hadoop uses Kerberos, a three-party protocol built on symmetric key cryptography to ensure any clients accessing the cluster are who they claim to be. In order to avoid frequent authentication checks against a Kerberos server, Delegation Tokens, a lightweight two-party authentication method, was introduced to complement Kerberos authentication. The Hadoop delegation token by default has a lifespan of one day and can be renewed for up to seven days. Kubernetes, on the other hand, uses a certificate-based approach for authentication, and does not expose the owner of a job in any of its public-facing APIs. Therefore, it is not possible to securely determine the authorized user from within the pod using the native Kubernetes API and then use that username to fetch the Hadoop delegation token for HDFS access.
To allow for Kubernetes workloads to securely access HDFS, we built Kube2Hadoop, a scalable and secure integration with HDFS Kerberos. This enables AI modelers at LinkedIn to use HDFS data in Kubernetes pods with access control through a user account or a headless account. Headless accounts are oftentimes used to denote a virtual team that is working on projects that would share the same data within the team. The data acquired can then be used in their model exploration and training with KubeFlow components such as the tf-operator and mpi-operator. In this blog, we will describe the design and authentication model of Kube2Hadoop.
Since the introduction of Hadoop to the open source community, HDFS has been a widely-adopted distributed file system in the industry for its scalability and robustness. With the growing popularity in running model training on Kubernetes, it is natural for many people to leverage the massive amount of data that already exists in HDFS. We think that Kube2Hadoop will benefit both the Kubernetes and Hadoop communities. And today, we are also pleased to announce that we are open sourcing this solution! You can find the source code available in our Github repository.
How does Kube2Hadoop work?
To ensure secure HDFS access for AI workloads running on Kubernetes at LinkedIn, we built a Kubernetes-native solution: Kube2Hadoop. It consists of three parts:
- Hadoop Token Service, for fetching delegation tokens, deployed as a Kubernetes Deployment;
- Kube2Hadoop Init Container in each worker pod as a client for sending requests to fetch a delegation token from Hadoop Token Service;
- IDDecorator (see further below) for writing an authenticated user-ID deployed as a Kubernetes Admission Controller.
The following diagram shows an overview of what a typical workflow would look like for the user:
- User performs a login to a Hadoop Gateway machine using their own credentials.
- User receives a certificate from a client authentication service.
- Using the obtained certificate, the user submits a job on the gateway to Kubernetes (K8s) cluster.
- The K8s API Server authenticates the user with the certificate and launches containers as requested. The Kube2Hadoop init containers are attached to each of the worker containers that require HDFS access. The init container then sends a request to the Hadoop Token Service for the delegation token.
- The Token Service, which acts as a Hadoop superuser (contains a superuser keytab), proxies as the user to fetch the delegation token.
- The returned token is mounted locally in the container.
- Once the training starts for the workers, they can seamlessly access HDFS using the fetched token.
- The Hadoop Token Service puts a watch on the status of each job to cancel the token when the job finishes and renew the token for long-running jobs.
Kube2Hadoop authentication workflow
Now let’s take a closer look at the authentication mechanism of Kube2Hadoop. When a user submits a job to the Kubernetes API server, the authentication is done via client authentication service. For example, in the aforementioned workflow, the user acquires a cert that contains their user-ID as principal from the gateway machine, and submits a job using that cert. The authenticated user-ID is persisted in the Kubernetes API server (not natively supported by Kubernetes; we will discuss this in detail in IDDecorator section), hence we know the authenticated owner of a deployment. To run as a headless account, a user can submit a deployment yaml file and add an annotation in the format: doAs: <headless account>. When fetching a delegation token, the Token Service can perform an LDAP lookup for headless accounts to determine whether the user has the authority to fetch the delegation token on behalf of the account.
Upon the Kubernetes job start, the init container in each worker container fetches the delegation token by making a call to the Token Service. The Hadoop Token Service validates the caller pod by extracting the IP address of the init container and comparing it against the IP address registered in the Kubernetes API server. The IP address check is to make sure that no pod in Kubernetes can impersonate other pods to get their delegation token. The Token Service can then extract the username from the Kubernetes API server call and fetch the delegation token for that user.
Kube2Hadoop authentication mechanism with key metadata
As mentioned earlier, the pod IP address and user-ID are key to the authentication mechanism. IDDecorator leverages Kubernetes’ Admission Controller to persist an immutable user-ID in pod annotation. When a user submits a job, the job init container requests the token service for a delegation token. The token service verifies the topology of the requesting pod from the primary Kubernetes model. Then it verifies whether:
- The submitter is a valid HDFS user
- This user belongs to the optional headless group provided.
The easiest way to pass the user information to the token service is via a pod annotation. However, ensuring that the job submitter is putting their real ldap username in the job annotation metadata is trickier.
The following is an example workflow for a TFJob:
IDDecorator typical workflow
- User submits a TFJob from Hadoop gateway.
- The Kubernetes API Server sends the TFJob AdmissionRequest to the IDDecorator mutating admissions webhook.
- IDDecorator decorates the AdmissionRequest with the submitter’s username.
- IDDecorator sends the decorated TFJob to the tf-operator.
- TFJob operator propagates the username metadata onto all the pods and sends decorated pod submission requests to the API Server.
- API Server sends the pod submission requests to the IDDecorator.
- IDDecorator overrides all pod-submission usernames except for the ones being overridden by Kubernetes system accounts.
Let’s consider the following threat models of adversary attacks.
Threat model 1: Attacker creates a deployment with a fake username
Threat model for fake username in deployment
- Attacker submits Deployment with fake username in its job annotation to the API Server.
- IDDecorator decorates the deployment with the submitter’s username, overwriting what the attacker has submitted.
- Deployment controller sends decorated pod submission request to the API Server.
- API Server sends pod AdmissionRequest to IDDecorator.
In the above scenario, the IDDecorator passes through the username field in job annotation at the deployment controller. When creating the pods, it ignores the username annotation provided through the deployment, and overwrites it with the submitter’s actual username to prevent pods from being created with fake annotation.
Threat model 2: Attacker creates a bare pod with a fake username
Threat model for fake username in pod
- Attacker submits a pod with fake username to the API Server.
- API Server sends pod admission request to IDDecorator.
- Pod gets decorated with the submitter's username, overwriting fake username.
IDDecorator overrides all pod-submission usernames except for the ones being overridden by Kubernetes system accounts. This prevents an attacker from creating a pod with a fake username annotation.
Threat model 3: Attacker Compromises Kubernetes Admin
In the unfortunate scenario of an adversary compromising the account of a Kubernetes admin, we want to protect our data from being obtained by the attacker as much as possible. A Kubernetes admin can obtain access to any running containers. Therefore, we recommend separating the Hadoop Token Service, which contains superuser keytab, out of the Kubernetes platform so that the attacker can’t have unlimited and untraceable access to HDFS. We also suggest blocklisting user/group accounts in Kube2Hadoop that have superuser access to HDFS. A Kubernetes admin impersonating an HDFS superuser can get access to data belonging to multiple HDFS accounts. Blocklisting superusers from using Kube2Hadoop forces an attacker to impersonate user accounts one at a time (and leaving audit trails of requesting delegation tokens for those users).
We’ve also considered other solutions for accessing HDFS from Kubernetes. The most straightforward way would be to have the user fetch the delegation token before submitting the job and attach the delegation token as a Kubernetes Secret. However, since a namespace is not exclusive to a single user, any user within that namespace could access all the secrets without specific resource-based access control (RBAC) rules, thus failing the security requirement. Adding fine-grained RBAC rules for each of our thousands of users across multiple namespaces, however, would greatly increase orchestration complexity. There is also extra intricacy with this approach; for example, the delegation token is ephemeral and the secret needs to be purged after the job finishes, which requires a container lifecycle hook. Kube2Hadoop is a superior solution to the Kubernetes Secret approach due to its cleaner access control to HDFS, its ability to automatically renew tokens, and its ease of managing the token’s life cycle.
We’d like to thank Keqiu Hu, Zhe Zhang, Tu Tran, Vasanth Rajamani, Huiji Gao, Sumit Sanghrajka and Yiming Ma for amazing management support of this project. A big shout-out to Cong Gu, Abin Shahab, Chen Qiang, Shan Valleru, Tengfei Mu and Ronak Nathani for their extensive contributions to the design and implementation of this project. Without all their early brainstorming sessions, we wouldn’t have been able to arrive at today’s solution. Many thanks to Bee-Chung Chen for his awesome technical guidance and support. Special thanks to our awesome early end user Mingzhou Zhou from the AI Foundation Team for helping with integration and providing feedback for improvement. A huge thank you to Tim Lam for the security review and suggestions.