Securing your Kubeflow deployment with Kyverno policies

--

We all know Kubernetes is awesome, and Kubeflow makes Kubernetes cool for machine learning (ML) teams. With Kubeflow, data scientists and ML engineers can share infrastructure and accelerate the delivery of ML models, while minimizing costs.

When multiple actors interact with shared infrastructure, security becomes a priority for IT. There is always “that one guy”, who will try to circumvent security to show how clever they are, right?

Especially in a platform like Kubeflow, where users are allowed to create and delete resources as they wish, a system that allows them fine-grained permissions on what they can and cannot do is critical.

In this post, we will explore how we achieved that using Kyverno.

What is the Current State of Security in Kubeflow?

Currently, Kubeflow is restricting users’ access to sensitive resources by using RBAC authorization rules. With RBAC rules, each user is confined inside their dedicated namespace and, therefore, cannot access or modify sensitive system resources and data.

But, even having permission to manage resources in one namespace can be proven disastrous for the security of the whole cluster.

Examples of malicious usage (part 1)

Let’s wear our black hats and get our hands dirty, shall we?

In the following example, we will demonstrate how a user can completely PWN a Kubernetes node in 3 simple steps.

What we will need:

  • Access to a Kubeflow deployment with the `kubeflow-admin` role
  • That’s all

Step 1: Create a Notebook

Login into the Kubeflow dashboard and create a simple Notebook:

Step 2: Create a malicious Pod

Using the Notebook, we create a new YAML file:

We open a new terminal and apply this resource:

jovyan@pwner-0:~$ kubectl apply -f pwn.yaml
pod/pwn configured

Step 3: Gain access

Now we simply open a terminal in the Pod we just created:

ovyan@pwner-0:~$ kubectl exec -it pwn -- bash
[root@node /]#

And voila! Root access in the Node.
We can “rm -rf /*” to punish that pesky IT admin we don’t like and ruin their weekend (please don’t …).

Summary

The previous example is not optimal for the cluster’s security, as we can acquire full root access in the node in a matter of minutes without obstacles.

We can do better!

Available Options

So, we need to beef up the cluster’s security. Let’s explore the available options to do that:

There are three major security solutions for Kubernetes:

  • Pod Security Policy (PSP)
  • Pod Security Admission (PSA)
  • Kyverno
  • Gatekeeper/OPA

Pod Security Policy (PSP)

Pod security policies are the first built-in Kubernetes method for fine-grained authorization of Pods creation. With PSPs, the admin can prevent the creation of Pods based on a set of conditions (e.g., don’t allow privileged containers. See the official documentation for all the possible rules: https://kubernetes.io/docs/concepts/security/pod-security-policy). We could use that to improve the security of our cluster. Still, PSPs come with some significant drawbacks:

  • The policies are applied only on Pods and not on other types of resources, e.g. ingresses.
  • The policies are applied to service accounts using RBAC rules and not directly to Pods. This restriction overcomplicates large deployments where many system service accounts come into play.
  • Pod Security Policies are deprecated as of Kubernetes v1.21 and will be removed in v1.25. We don’t want to invest in a technology that has an expiration date.

Pod Security Admission (PSA)

Pod Security Admission (PSA) is the successor to PSPs. Similar to PSPs, the admin can restrict the creation of Pods with the difference that the rules are applied per namespace. Still, there are some drawbacks:

  • Again, only for Pods validation.
  • Not very mature.
  • Not very configurable.
  • Enabled by feature gate (or manually installing the admission webhook).

Kyverno

Kyverno is a policy engine designed specifically for Kubernetes from the ground up. It is an open-source project developed by Nirmata and donated to CNCF, currently in incubating state. Kyverno works by using a dynamic admission controller to:

  • Validate resources
  • Mutate resources
  • Generate resources

by defining policies in YAMLs, as Kubernetes resources. This looks promising (you read the title, you know where this is going :-)) …

Gatekeeper/OPA

OPA (Open Policy Agent) is a graduated CNCF project. It is a general-purpose policy engine (not Kubernetes specific), and you can use its high-level language to define policies. Gatekeeper is a dynamic admission controller that uses OPA as a policy engine for Kubernetes resources. Using native Kubernetes CRDs, you can define policies that:

  • Validate resources.
  • Mutate resources.
  • Generate resources.

The high-level DSL of OPA makes it a very versatile tool but a little hard to configure.

Summary

The following table shows a summary of the pros and cons of each solution:

Why we chose Kyverno

We chose Kyverno for the policy engine in our cluster to strike a balance between versatility and ease of use. Due to the fluid nature of a Kubeflow deployment, we want every administrator to be able to understand and manage the security policies and customize them based on their needs.

Setting up Kyverno

Step1: Install

Kyverno provides 2 methods of installation:

We choose to go with the simple YAMLs installation. To install Kyverno, run:

user@localhost:~$ kubectl create -f https://raw.githubusercontent.com/kyverno/kyverno/release-1.7/config/release/install.yaml

To validate that everything is up and run:

user@localhost:~$ kubectl get pods -n kyverno
NAME READY STATUS RESTARTS AGE
kyverno-5cc856f997-hxxbq 1/1 Running 0 28h

Step 2: Configure policies

Now to the fun part!

We will leverage the capabilities of Kyverno policies to restrict malicious users’ ability to take over our cluster.

Kyverno provides a library of sample policies that we can use and prevent common attack vectors ( https://kyverno.io/policies).

In this example, we are going to configure 2 policies:

1) Prevent users from deploying Pods that are sharing the host namespaces.

The first policy that we will apply prevents the users from creating a Pod that shares the namespaces with the host.

More specifically a user cannot create a Pod that has the hostPID, hostNetwork or hostIPC set to true, e.g:

Create a new `disallow-host-namespaces.yaml` file with the following content:

And apply it:

user@localhost:~$ kubectl apply -f disallow-host-namespaces.yaml clusterpolicy.kyverno.io/disallow-host-namespaces created

(source: https://kyverno.io/policies/pod-security/baseline/disallow-host-namespaces/disallow-host-namespaces/)

2) Prevent users from creating Pods with privileged containers.

Now we will create a policy that prevents users from creating Pods that contain privileged containers, e.g:

Create a new `disallow-privileged-containers.yaml` file with the following content:

And apply it:

user@localhost:~$ kubectl apply -f disallow-privileged-containers.yaml clusterpolicy.kyverno.io/disallow-privileged-containers created

(source: https://kyverno.io/policies/pod-security/baseline/disallow-privileged-containers/disallow-privileged-containers/)

Finally verify that the policies are ready:

Now we are ready!

Examples of malicious usage (part 2)

Let’s revisit the previous example where the user gained access to our node by creating a privileged Pod.

Open your Notebook and try to create the same Pod as we created before:

jovyan@pwner-0:~$ kubectl apply -f pwn.yaml 
Error from server: error when creating "pwn.yaml": admission
webhook "validate.kyverno.svc-fail" denied the request:
resource Pod/kubeflow-user/pwn was blocked due to the following policies disallow-host-namespaces:
host-namespaces: 'validation error: Sharing the host
namespaces is disallowed. The
fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set to `false`. .
Rule host-namespaces failed at path /spec/hostNetwork/'
disallow-privileged-containers:
privileged-containers: 'validation error: Privileged mode is disallowed. The fields
spec.containers[*].securityContext.privileged and spec.initContainers[*].securityContext.privileged must be unset or set to `false`. .
Rule privileged-containers failed at path /spec/containers/0/securityContext/privileged/'

Something different happened this time! The privileged Pod was not allowed and `kubectl` returned a validation error explaining what was wrong with the manifest.

Conclusions

Kubeflow is a very versatile platform, but from a security perspective, improvements are needed. In this post, we saw an example of how we utilize Kyverno at Arrikto Enterprise Kubeflow to improve the security of our cluster in a multi-user environment.

Stay tuned for part 2, where we explore some caveats of Kyverno and the configuration we needed to make the integration as seamless as possible.

Originally published at https://www.arrikto.com on September 22, 2022.

--

--