Certified Kubernetes Security Specialist (CKS) Notes

portfolio_view

https://www.cncf.io/certification/cks/

Certified Kubernetes Security Specialist (CKS) Notes

Exam
Preparation
- Study Resources
- Practice
Fundamentals
Cluster Setup
Cluster Hardening
System Hardening
Minimize Microservice Vulnerabilities
Supply Chain Security
Monitoring, Logging, and Runtime Security

Exam

Outline

https://github.com/cncf/curriculum/blob/master/CKS_Curriculum%20v1.31.pdf

Cirriculum

Exam objectives that outline the knowledge, skills, and abilities that a Certified Kubernetes Security Specialist (CKS) can be expected to demonstrate.

Cluster Setup (10%)

Use Network security policies to restrict cluster level access
- Kubernetes Documentation > Concepts > Services, Load Balancing, and Networking > Network Policies
Use CIS benchmark to review the security configuration of Kubernetes components (etcd, kubelet, kubedns, kubeapi)
Properly set up Ingress objects with security control
- Kubernetes Documentation > Concepts > Services, Load Balancing, and Networking > Ingress > TLS

Protect node metadata and endpoints

Kubernetes Documentation > Tasks > Administer a Cluster > Securing a Cluster

# all pods in namespace cannot access metadata endpoint
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: cloud-metadata-deny
namespace: default
spec:
podSelector: {}
policyTypes:
- Egress
egress:
- to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 169.254.169.254/32

Minimize use of, and access to, GUI elements
- Kubernetes Documentation > Tasks > Access Applications in a Cluster > Deploy and Access the Kubernetes Dashboard
Verify platform binaries before deploying
- Kubernetes Documentation > Tasks > Install Tools > Install and Set Up kubectl on Linux
  
  Note: Check the step 2 - validate binary

Supply Chain Security (20%)

Minimize base image footprint
- Remove exploitable and non-sssential software
- Use multi-stage Dockerfiles to keep software compilation out of runtime images
- Never bake any secrets into your images
- Image scanning
Secure your supply chain: whitelist allowed image registries, sign and validate images
- Kubernetes Documentation > Reference > API Access Control > Using Admission Controllers > ImagePolicyWebhook
Use static analysis of user workloads (e.g. kubernetes resources, docker files)
- Secure base images
- Remove unnecessary packages
- Stop containers from using elevated privileges
Scan images for known vulnerabilities
- Trivy

Monitoring, Logging and Runtime Security (20%)

Perform behavioral analytics of syscall process and file activities at the host and container level to detect malicious activities
- Falco
Detect threats within physical infrastructure, apps, networks, data, users and workloads
Detect all phases of attack regardless where it occurs and how it spreads
- Protecting Kubernetes Against MITRE ATT&CK
Perform deep analytical investigation and identification of bad actors within environment
- Kubernetes Documentation > Tasks > Monitoring, Logging, and Debugging >Auditing
Ensure immutability of containers at runtime
- Kubernetes Documentation > Concepts > Containers
- Kubernetes Documentation > Tasks > Configure Pods and > Containers > Configure a Security Context for a Pod or Container
  
  readOnlyRootFilesystem: Mounts the container’s root filesystem as read-only
Use Audit Logs to monitor access
- Kubernetes Documentation > Tasks > Monitoring, Logging, and Debugging >Auditing

Changes

https://kodekloud.com/blog/cks-exam-updates-2024-your-complete-guide-to-certification-with-kodekloud/
https://training.linuxfoundation.org/cks-program-changes/

Software / Environment

As of 11/2024

Kubernetes version: 1.31
Ubuntu 20.04
Terminal
- Bash
Tools available
- vim - Text/Code editor
- tmux - Terminal multiplexor
- jq - Working with JSON format
- yq - Working with YAML format
- firefox - Web Browser for accessing K8s docs
- base64 - Tool to convert to and from base 64
- kubectl - Kubernetes CLI Client
- more typical linux tools like grep, wc …
3rd Party Tools to know
- tracee
- OPA Gatekeeper
- kubebench
- syft
- grype
- kube-linter
- kubesec
- trivy
- falco

Exam Environment Setup

Terminal Shortcuts/Aliases

The following are useful terminal shortcut aliases/shortcuts to use during the exam.

Add the following to the end of ~/.bashrc file:

alias k='kubectl # <-- Most general and useful shortcut!

alias kd='kubectl delete --force --grace-period=0 # <-- Fast deletion of resources

alias kc="kubectl create" # <-- Create a resource
alias kc-dry='kubectl create --dry-run=client -o yaml # <-- Create a YAML template of resource

alias kr='kubectl run' # <-- Run/Create a resource (typically pod)
alias kr-dry='kubectl run --dry-run=client -o yaml # <-- Create a YAML template of resource

# If kc-dry and kr-dry do not autocomplete, add the following

export do="dry-run=client -o yaml" # <-- Create the YAML tamplate (usage: $do)

The following are some example usages:

k get nodes -o wide
kc deploymentmy my-dep --image=nginx --replicas=3
kr-dry my-pod --image=nginx --command sleep 36000
kr-dry --image=busybox -- "/bin/sh" "-c" "sleep 36000"
kr --image=busybox -- "/bin/sh" "-c" "sleep 36000" $do

Terminal Command Completion

The following is useful so that you can use the TAB key to auto-complete a command, allowing you to not always have to remember the exact keyword or spelling.

Type the following into the terminal:

- kubectl completion bash >> ~/.bashrc`-`kubectl` command completion
- kubeadm completion bash >> ~/.bashrc`-`kubeadm` command completion
- exec $SHELL` - Reload shell to enable all added completion

VIM

The exam will have VIM or nano terminal text editor tools available. If you are using VIM ensure that you create a ~/.vimrc file and add the following:

set ts=2 " <-- tabstop - how many spaces is \t worth
set sw=2 " <-- shiftwidth - how many spaces is indentation
set et " <-- expandtab - Use spaces, never \t values
set mouse=a " <-- Enable mouse support

Or simply:

set ts=2 sw=2 et mouse=a

Also know VIM basics are as follows. Maybe a good idea to take a quick VIM course.

vim my-file.yaml - If file exists, open it, else create it for editing
:w - Save
:x - Save and exit
:q - Exit
:q! - Exit without saving
i - Insert mode, regular text editor mode
v - Visual mode for selection
ESC - Normal mode

Pasting Text Into VIM

Often times you will want to paste text or code from the Kubernetes documentation into into a VIM terminal. If you simply do that, the tabs will do funky things.

Do the following inside VIM before pasting your copied text:

In NORMAL mode, type :set paste
Now enter INSERT mode

You should see – INSERT (paste) -- at the bottom of the screen

Paste the text

You can right click with mouse and select Paste or CTRL + SHIFT + v

tmux

tmux will allow you to use multiple terminal windows in one (aka terminal multiplexing). Make sure you know the basics for tmux usage:

tmux- Turn and entertmux
CTRL + b " - Split the window vertically (line is horizontal)
CTRL + b % - Split the window horizontally (line is vertical)
CTRL + b <ARROW KEY> - Switch between window panes
CTRL + b (hold) <ARROW KEY> - Resize current window pane
CTRL + b z - Toggle full terminal/screen a pane (good for looking at a full document)
CTRL + dorexit - Close a window pane

Mouse Support

If you want to be able to click and select within tmux and tmux panes, you can also enable mouse support. This can be useful.

These steps must be done outside of tmux`

Create a .tmux.conf file and edit it
- vim ~/.tmux.conf
Add the configuration, save, and exit file
- set -g mouse on
Reload tmux configuration
- tmux source .tmux.conf

Preparation

Fundamentals

You should already have CKA level knowledge
Linux Kernel Namespaces isolate containers
- PID Namespace: Isolates processes
- Mount Namespace: Restricts access to mounts or root filesystem
- Network Namespace: Only access certain network devices. Firewall and routing rules
- User Namespace: Different set of UIDs are used. Example: User (UID 0) inside one namespace can be different from user(UID 0) inside another namespace
cgroups restrict resource usage of processes
- RAM/Disk/CPU
Using cgroups and linux kernel namespaces, we can create containers

Understand the Kubernetes Attack Surface

Kubernetes is a complex system with many components. Each component has its own vulnerabilities and attack vectors.
The attack surface can be reduced by:
- Using network policies to restrict traffic between pods
- Using RBAC to restrict access to the kube-api server
- Using admission controllers to enforce security policies
- Using pod security standards to enforce security policies
- Using best practices to secure the underlying infrastructure
- Using securityContext to enforce security policies for pods

The 4 C’s of Cloud-Native Security

Cloud: Security of the cloud infrastructure
Cluster: Security of the cluster itself
Container: Security of the containers themselves
Code: Security of the code itself

1 Cluster Setup

CIS Benchmark

What is a security benchmark?

A security benchmark is a set of standard benchmarks that define a state of optimized security for a given system (servers, network devices, etc.)
CIS (Center for Internet Security) provides standardized benchmarks (in the form of downloadable files) that one can use to implement security on their system.
CIS provides benchmarks for public clouds (Azure, AWS, GCP, etc.), operating systems (Linux, Windows, MacOS), network devices (Cisco, Juniper, HP, etc.), mobile devices (Android and Apple), desktop and server software (such as Kubernetes)
View more info here
You must register at the CIS website to download benchmarks
Each benchmark provides a description of a vulnerability, as well as a path to resolution.
CIS-CAT is a tool you can run on a system to generate recommendations for a given system. There are two versions available for download, CIS-CAT Lite and CIS-CAT Pro. The Lite version only includes benchmarks for Windows 10, MacOS, Ubuntu, and desktop software (Google Chrome, etc.). The Pro version includes all benchmarks.
CIS Benchmarks for Kubernetes
- Register at the CIS website and download the CIS Benchmarks for kubernetes
- Includes security benchmarks for master and worker nodes

KubeBench

KubeBench is an alternative to CIS-CAT Pro to run benchmarks against a Kubernetes cluster.
KubeBench is open source and maintained by Aqua Security
KubeBench can be deployed as a Docker container or a pod. It can also be invoked directly from the binaries or compiled from source.
Once run, kube-bench will scan the cluster to identify if best-practices have been implemented. If will output a report specifying which benchmarks have passed/failed. It will tell you how to fix any failed benchmarks.
You can view the report by tailing the pod logs of the kube-bench pod.

Cluster Upgrades

The controller-manager and kube-scheduler can be one minor revision behind the API server.
- For example, if the API server is at version 1.10, controller-manager and kube-scheduler can be at 1.9 or 1.10
The kubelet and kube-proxy can be up to 2 minor revisions behind the API server
kubectl can be x+1 or x-1 minor revisions from the kube API server
You can upgrade the cluster one minor version at a time

Upgrade Process

Drain and cordon the node before upgrading it
- kubectl drain <node name> --ignore-daemonsets
Upgrade the master node first.
Upgrade worker nodes after the master node.

Upgrading with Kubeadm

If the cluster was created with kubeadm, you can use kubeadm to upgrade it.

The upgrade process with kubeadm:

# Increase the minor version in the apt repository file for kubernetes:
  sudo vi /etc/apt/sources.list.d/kubernetes.list

# Determine which version to upgrade to
  sudo apt update
  sudo apt-cache madison kubeadm

# Upgrade kubeadm first
  sudo apt-mark unhold kubeadm && \
  sudo apt-get update && sudo apt-get install -y kubeadm='1.31.x-*' && \
  sudo apt-mark hold kubeadm

# Verify the version of kubeadm
  kubeadm version

# Check the kubeadm upgrade plan
  sudo kubeadm upgrade plan

# Apply the upgrade plan
  sudo kubeadm upgrade apply v1.31.x

# Upgrade the nodes
  sudo kubeadm upgrade node

# Upgrade kubelet and kubectl
  sudo apt-mark unhold kubelet kubectl && \
  sudo apt-get update && sudo apt-get install -y kubelet='1.31.x-*' kubectl='1.31.x-*' && \
  sudo apt-mark hold kubelet kubectl

# Restart the kubelet
  sudo systemctl daemon-reload
  sudo systemctl restart kubelet

Network Policies

Overview

Kubernetes Network Policies allow you to control the flow of traffic to and from pods. They define rules that specify:
- What traffic is allowed to reach a set of pods.
- What traffic a set of pods can send out.
Pods can communicate with each other by default. Network Policies allow you to restrict this communication.
Network Policies operate at Layer 3 and Layer 4 (IP and TCP/UDP). They do not cover Layer 7 (application layer).
Network Policies are additive. Meaning, to grant more permissions for network communication, simply create another network policy with more fine-grained rules.
Network Policies are implemented by the network plugin. The network plugin must support NetworkPolicy for the policies to take effect.

Network Policies are namespace-scoped. They apply to pods in the same namespace.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata: 
  name: default-deny-all
  namespace: secure-namespace
spec:
    podSelector: {}
    policyTypes
    - Ingress

Say we now want to grant the ‘frontend’ pods with label ‘teir: frontend’ in the ‘app’ namespace access to the ‘backend’ pods in ‘secure-namespace’. We can do that by creating another Network Policy like this:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-app-pods
  namespace: secure-namespace
spec:
    podSelector:
      matchLabels:
        tier: backend
    policyTypes:
    - Ingress
    ingress:
    - from:
      - namespaceSelector:
          matchLabels:
            name: app
        podSelector:
          matchLabels:
            teir: frontend
      ports:
      - protocol: TCP
        port: 3000

Key Concepts

Namespace Scope: Network policies are applied at the namespace level.
Selector-Based Rules:
- Pod Selector: Select pods the policy applies to.
- Namespace Selector: Select pods based on their namespace.
Traffic Direction:
- Ingress: Traffic coming into the pod.
- Egress: Traffic leaving the pod.
Default Behavior:
- Pods are non-isolated by default (accept all traffic).
- A pod becomes isolated when a network policy matches it.

Common Fields in a Network Policy

podSelector: Specifies the pods the policy applies to.
ingress/egress: Lists rules for ingress or egress traffic.
from/to: Specifies allowed sources/destinations (can use IP blocks, pod selectors, or namespace selectors).
ports: Specifies allowed ports and protocols.

Example Network Policies

Allow All Ingress Traffic

````
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-all-ingress
  namespace: default
spec:
  podSelector: {}
  ingress:
  - {}
```

Deny All Ingress and Egress Traffic

````
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
    name: deny-all
    namespace: defaulT
spec:
    podSelector: {}
    ingress: []
    egress: []

```

Allow Specific Ingress from a Namespace

```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
    name: allow-namespace-ingress
    namespace: default
spec:
    podSelector:
        matchLabels:
            app: my-app
    ingress:
    - from:
      - namespaceSelector:
        matchLabels:
        team: frontend
```

Allow Egress to a Specific IP

```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
    name: allow-egress-specific-ip
    namespace: default
spec:
    podSelector:
        matchLabels:
            app: my-app
    egress:
    - to:
      - ipBlock:
        cidr: 192.168.1.0/24
        ports:
      - protocol: TCP
        port: 8080
```

Cilium Network Policy

Cilium Network Policies provide more granularity, flexibility, and features than traditional Kubernetes Network Policies
Cilium Network Policies operate up to layer 7 of the OSI model. Traditional Network Policies only operate up to layer 4.
Cilium Network Policies perform well due to the fact that they use eBPF
Hubble allows you to watch traffic going to and from pods
You can add Cilium to the cluster by:
- Deploying with helm
- Running cilium install after you install the cilium CLI tool

Cilium Network Policy Structure

Cilium Network Policies are defined in YAML files
The structure is similar to Kubernetes Network Policies

Layer 3 Rules

Endpoints Based - Apply the policy to pods based on Kubernetes label selectors
Services Based - Apply the policy based on kubernetes services, controlling traffic based on service names rather than individual pods

Entities Based - Cilium has pre-defined entities like cluster, host, and world. This type of policy uses these entities to determine what traffic the policy is applied to.

Cluster - Represents all kubernetes endpoints

Example:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-egress-to-cluster-resources
spec:
  endpointSelector: {}
  egress:
  - toEntities:
    - cluster

World - Represents any external traffic, but not cluster traffic

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-egress-to-external-resources
spec:
  endpointSelector: {}
  egress:
  - toEntities:
    - world

Host - Represents the local kubernetes node
Remote-node - Represents traffic from a remote node

All - Represents all endpoints both internal and external to the cluster

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-egress-to-external-resources
spec:
  endpointSelector: {}
  egress:
  - toEntities:
    - all

Node Based - Apply the policy based on nodes in the cluster
IP/CIDR Based - Apply the policy based on IP addresses or CIDR blocks

Layer 4 Rules

If no layer 4 rules are defined, all traffic is allowed for layer 4

Example:

  apiVersion: "cilium.io/v2"
  kind: CiliumNetworkPolicy
  metadata:
    name: allow-external-80
  spec:
    endpointSelector:
      matchLabels:
        run: curl
    egress:
      - toPorts:
        - ports:
          - port: "80"
            protocol: TCP

Layer 7 Rules

Deny Policies

You can create deny policies to explicitly block traffic
Deny policies take higher precedence over allow policies

ingressDeny Example:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: deny-ingress-80-for-backend
spec:
  endpointSelector:
    matchLabels:
      app: backend
  ingressDeny:
  - fromEntities:
    - all
  - toPorts:
    - ports:
      - port: "80"
        protocol: TCP

egressDeny Example:

apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "deny-egress"
spec:
  endpointSelector:
    matchLabels:
      app: random-pod
  egress:
  - toEntities:
    - all
  egressDeny:
  - toEndpoints:
    - matchLabels:
        app: server

Examples

Default Deny All

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: default-deny-all
spec:
  endpointSelector: {}
  ingress:
  - fromEntities:
    - world

Kubernetes Ingress

What is Ingress?

Ingress is an API object that manages external access to services in a Kubernetes cluster, typically HTTP and HTTPS.
Provides:
- Load balancing
- SSL termination
- Name-based virtual hosting

Why Use Ingress?

To consolidate multiple service endpoints behind a single, externally accessible URL.
Reduce the need for creating individual LoadBalancers or NodePort services.

Key Components of Ingress

Ingress Controller
- Software that watches for Ingress resources and implements the rules.
- Popular Ingress controllers:
  - ingress-nginx
  - Traefik
  - HAProxy
  - Istio Gateway
- Must be installed separately in the cluster.
Ingress Resource
- The Kubernetes object that defines how requests should be routed to services.

Ingress Resource Configuration

As of Kubernetes 1.20, you can create an ingress using kubectl:
```
kubectl create ingress  --rule="host/path=service:port"
```

Basic Structure

```
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: example-ingress
spec:
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: example-service
                port:
                  number: 80
```

Ingress with TLS

Kubernetes automatically creates a self-signed certificate for HTTPS. To view it, first determine the HTTPS port of the ingress controller service:

kubeadmin@kube-controlplane:~$ k get svc -n ingress-nginx
NAME                                 TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
ingress-nginx-controller             NodePort    10.103.169.156   <none>        80:31818/TCP,443:30506/TCP   38m
ingress-nginx-controller-admission   ClusterIP   10.103.26.228    <none>        443/TCP                      38m
kubeadmin@kube-controlplane:~$

The HTTPS port is 30506 in this case. To view the self-signed certificate, we can use curl:

 λ notes $ curl https://13.68.211.113:30506/service1 -k -v
* (304) (OUT), TLS handshake, Finished (20):
} [52 bytes data]
* SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384 / [blank] / UNDEF
* ALPN: server accepted h2
* Server certificate:
*  subject: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate             <<<<<<<<<<<<<<<<
*  start date: Dec 20 14:23:08 2024 GMT
*  expire date: Dec 20 14:23:08 2025 GMT
*  issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* using HTTP/2
* [HTTP/2] [1] OPENED stream for https://13.68.211.113:30506/service1
* [HTTP/2] [1] [:method: GET]
* [HTTP/2] [1] [:scheme: https]
* [HTTP/2] [1] [:authority: 13.68.211.113:30506]
* [HTTP/2] [1] [:path: /service1]
* [HTTP/2] [1] [user-agent: Mozilla/5.0 Gecko]
* [HTTP/2] [1] [accept: */*]
> GET /service1 HTTP/2
> Host: 13.68.211.113:30506
> User-Agent: Mozilla/5.0 Gecko
> Accept: */*

To configure a ingress resource to use TLS (HTTPS), we first need to create a certificate:

# create a new 2048-bit RSA private key and associated cert
openssl req -nodes -new -x509 -keyout my.key -out my.crt -subj "/CN=mysite.com"

Next, create a secret for the tls cert:

kubectl create secret tls mycert --cert=my.crt --key=my.key -n my-namespace

Create the ingress:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: secure-ingress
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  tls:
    - hosts:
        - example.com
      secretName: mycert
  rules:
    - host: example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: secure-service
                port:
                  number: 80

Annotations

Extend the functionality of Ingress controllers.
Common examples (specific to nginx):
- nginx.ingress.kubernetes.io/rewrite-target: Rewrite request paths.
- nginx.ingress.kubernetes.io/ssl-redirect: Force SSL.
- nginx.ingress.kubernetes.io/proxy-body-size: Limit request size.

Protecting Node Metadata and Endpoints

Protecting Endpoints

- Kubernetes clusters expose information on various ports:

    | Port Range | Purpose  |
    | ---------- | -------  | 
    | 6443       | kube-api |
    | 2379 - 2380 | etcd    |
    | 10250       | kubelet api |
    | 10259       | kube-scheduler |
    | 10257       | kube-controller-manager |

- Many of these ports are configurable. For example, to change the port that kube-api listens on, just modify `--secure-port` in the kube-api manifest.
- Setup firewall rules to minimize the attack surface

Securing Node Metadata

A lot of information can be obtained from node metadata
- Node name
- Node state
- annotations
- System Info
- etc.
Why secure node metadata?
- If node metadata is tampered with, pods may be assigned to the wrong nodes, which has security implications to considers
- You can determine the version of kubelet and other kubernetes components from node metadata
- If an attacker can modify node metadata, they could taint all the nodes, making all nodes unscheduleable
Protection Strategies
- Use RBAC to control who has access to modify node metadata
- Node isolation using labels and node selectors
- Audit logs to determine who is accessing the cluster and respond accordingly
- Update node operating systems regularly
- Update cluster components regularly
Cloud providers such as Amazon and Azure often expose node information via metadata endpoints on the node. These endpoints are important to protect.

This endpoint can be accessed at 169.254.169.254 on nodes in both Azure and AWS. An example for Azure:

curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq

Node metadata endpoints can be prevented from being accessed by pods by creating network policies.

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress-metadata-server
  namespace: a12
spec:
  policyTypes:
  - Egress
  podSelector: {}
  egress:
  - to:
    - ipBlock:
        cidr: 0.0.0.0/0
        except:
        - 169.254.169.254/32

Verify Kubernetes Binaries

The SHA sum of a file changes if the content within the file is changed
You can download the binaries from github using wget. Example: wget -O /opt/kubernetes.tar.gz https://dl.k8s.io/v1.31.1/kubernetes.tar.gz
To validate that a binary downloaded from the internet has not been modified, check the hash code:
```
echo $(cat kubectl.sha256) kubectl | sha256sum --check
```

Securing etcd

etcd is a distributed key-value store that Kubernetes uses to store configuration data
etcd by default listens on port 2379/tcp

Play with etcd

Step 1: Create the Base Binaries Directory

```sh
    mkdir /root/binaries
    cd /root/binaries
```

Step 2: Download and Copy the ETCD Binaries to Path

```sh
    wget https://github.com/etcd-io/etcd/releases/download/v3.5.18/etcd-v3.5.18-linux-amd64.tar.gz

    tar -xzvf etcd-v3.5.18-linux-amd64.tar.gz

    cd /root/binaries/etcd-v3.5.18-linux-amd64/

    cp etcd etcdctl /usr/local/bin/
```

Step 3: Start etcd

```sh
    cd /tmp
    etcd
```

Step 4: Verification - Store and Fetch Data from etcd

```sh
    etcdctl put key1 "value1"
```

```sh
    etcdctl get key1
```

Encrypting data in transit in etcd

etcd supports TLS encryption for data in transit
By default, etcd packaged with kubeadm is configured to use TLS encryption

One can capture packets from etcd using tcpdump:

      root@controlplane00:/var/lib/etcd/member# tcpdump -i lo -X port 2379

      tcpdump: verbose output suppressed, use -v[v]... for full protocol decode
      listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes
      16:10:01.691453 IP localhost.2379 > localhost.42040: Flags [P.], seq 235868994:235869033, ack 3277609642, win 640, options [nop,nop,TS val 1280288044 ecr 1280288042], length 39
              0x0000:  4500 005b 35e4 4000 4006 06b7 7f00 0001  E..[5.@.@.......
              0x0010:  7f00 0001 094b a438 0e0f 1342 c35c 5aaa  .....K.8...B.\Z.
              0x0020:  8018 0280 fe4f 0000 0101 080a 4c4f a52c  .....O......LO.,
              0x0030:  4c4f a52a 1703 0300 2289 00d8 5dcc 7b88  LO.*...."...].{.
              0x0040:  6f7a 290f 536b 0fd0 f7d9 1fb4 f83f 4aab  oz).Sk.......?J.
              0x0050:  a6e7 0af8 0835 e597 a93d 4d              .....5...=M
      16:10:01.691479 IP localhost.42040 > localhost.2379: Flags [.], ack 39, win 14819, options [nop,nop,TS val 1280288044 ecr 1280288044], length 0
              0x0000:  4500 0034 7174 4000 4006 cb4d 7f00 0001  E..4qt@.@..M....
              0x0010:  7f00 0001 a438 094b c35c 5aaa 0e0f 1369  .....8.K.\Z....i
              0x0020:  8010 39e3 fe28 0000 0101 080a 4c4f a52c  ..9..(......LO.,
              0x0030:  4c4f a52c                                LO.,
      16:10:01.691611 IP localhost.2379 > localhost.42040: Flags [P.], seq 39:1222, ack 1, win 640, options [nop,nop,TS val 1280288044 ecr 1280288044], length 1183
              0x0000:  4500 04d3 35e5 4000 4006 023e 7f00 0001  E...5.@.@..>....
              0x0010:  7f00 0001 094b a438 0e0f 1369 c35c 5aaa  .....K.8...i.\Z.
              0x0020:  8018 0280 02c8 0000 0101 080a 4c4f a52c  ............LO.,
              0x0030:  4c4f a52c 1703 0304 9ac0 c579 d4ed 808c  LO.,.......y....

              ..... redacted

The traffic captured in the output above is encrypted.

Encrypting data at rest in etcd

By default, the API server stores plain-text representations of resources into etcd, with no at-rest encryption.

etcd stores data in the /var/lib/etcd/member directory. When the database is not encrypted, one can easily grep the contents of this directory, looking for secrets:

root@controlplane00:/var/lib/etcd/member# ls -lisa
total 16
639000 4 drwx------ 4 root root 4096 Mar 21 10:53 .
385187 4 drwx------ 3 root root 4096 Mar 21 10:52 ..
639002 4 drwx------ 2 root root 4096 Mar 21 14:43 snap
638820 4 drwx------ 2 root root 4096 Mar 21 11:59 wal

root@controlplane00:/var/lib/etcd/member# grep -R test-secret .
grep: ./wal/00000000000000ac-0000000000a9340b.wal: binary file matches
grep: ./wal/00000000000000a8-0000000000a721c1.wal: binary file matches
grep: ./wal/00000000000000aa-0000000000a83f1e.wal: binary file matches
grep: ./wal/00000000000000a9-0000000000a7b97e.wal: binary file matches
grep: ./wal/00000000000000ab-0000000000a8d8a7.wal: binary file matches
grep: ./snap/db: binary file matches

The kube-apiserver process accepts an argument –encryption-provider-config that specifies a path to a configuration file. The contents of that file, if you specify one, control how Kubernetes API data is encrypted in etcd.
If you are running the kube-apiserver without the –encryption-provider-config command line argument, you do not have encryption at rest enabled. If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies the identity provider as the first encryption provider in the list, then you do not have at-rest encryption enabled (the default identity provider does not provide any confidentiality protection.)
If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies a provider other than identity as the first encryption provider in the list, then you already have at-rest encryption enabled. However, that check does not tell you whether a previous migration to encrypted storage has succeeded.

Example EncryptionConfiguration:

  apiVersion: apiserver.config.k8s.io/v1
  kind: EncryptionConfiguration
  resources:
    - resources:
        - secrets
        - configmaps
        - pandas.awesome.bears.example # a custom resource API
      providers:
        # This configuration does not provide data confidentiality. The first
        # configured provider is specifying the "identity" mechanism, which
        # stores resources as plain text.
        #
        - identity: {} # plain text, in other words NO encryption
        - aesgcm:
            keys:
              - name: key1
                secret: c2VjcmV0IGlzIHNlY3VyZQ==
              - name: key2
                secret: dGhpcyBpcyBwYXNzd29yZA==
        - aescbc:
            keys:
              - name: key1
                secret: c2VjcmV0IGlzIHNlY3VyZQ==
              - name: key2
                secret: dGhpcyBpcyBwYXNzd29yZA==
        - secretbox:
            keys:
              - name: key1
                secret: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY=
    - resources:
        - events
      providers:
        - identity: {} # do not encrypt Events even though *.* is specified below
    - resources:
        - '*.apps' # wildcard match requires Kubernetes 1.27 or later
      providers:
        - aescbc:
            keys:
            - name: key2
              secret: c2VjcmV0IGlzIHNlY3VyZSwgb3IgaXMgaXQ/Cg==
    - resources:
        - '*.*' # wildcard match requires Kubernetes 1.27 or later
      providers:
        - aescbc:
            keys:
            - name: key3
              secret: c2VjcmV0IGlzIHNlY3VyZSwgSSB0aGluaw==

Each resources array item is a separate config and contains a complete configuration. The resources.resources field is an array of Kubernetes resource names (resource or resource.group) that should be encrypted like Secrets, ConfigMaps, or other resources.
https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/
After enabling encryption in etcd, any resources that you created prior to enabling encryption will not be encrypted. For example, you can encrypt secrets by running:

kubectl get secrets -A -o yaml | kubectl replace -f -

Example of getting a secret in etcd:

root@controlplane00:/etc/kubernetes/pki# ETCDCTL_API=3 etcdctl --cacert=./etcd/ca.crt --cert=./apiserver-etcd-client.crt --key=./apiserver-etcd-client.key get /registry/secrets/default/mysecret

/registry/secrets/default/mysecret
k8s:enc:aescbc:v1:key1:ܨt>;8ܑ%TUIodEs*lsHGwjeF8S!Aqaj\Pq;9Ⱥ7dJe{B2=|p4#'BuCxUY,*IuFM
                                                                                   wxx@
2Q0e5UzH^^)rX_H%GUɈ-XqC.˽pC `kBW>K12 n

The path to the resource in the etcd database is ‘/registry///’

Securing kube-apiserver

Kube-apiserver acts as the gateway for all resources in kubernetes. Kube-apiserver is the only component in kubernetes that communicates with etcd
kube-apiserver authenticates to etcd using TLS client certificates.
Kube-apiserver should encrypt data before it is stored in etcd
kube-apiserver should only listen on an HTTPS endpoint. There was an option to host kube-apiserver on an HTTP endpoint, but this option has been deprecated as of 1.10 and removed in 1.22
kube-apiserver should have auditing enabled

Authentication

One can authentication to the KubeAPI server using certificates or a kubeconfig file

Access Controls

After a request is authenticated, it is authorized. Authorization is the process of determining what actions a user can perform.
Multiple authorization modules are supported:
- AlwaysAllow - Allows all requests
- AlwaysDeny - Blocks all requests
- RBAC - Role-based access control for requests. This is the default authorization module in kubernetes
- Node - Authorizes kubelets to access the kube-api server

2 Cluster Hardening

Securing Access to the KubeAPI Server

A request to the KubeAPI server goes through 4 stages before it is processed by KubeAPI:
- Authentication
  - Validates the identity of the caller by inspecting client certificates or tokens
- Authorization
  - The authorization stage verifies that the identity found in the first stage can access the verb and resource in the request
- Admission Controllers
  - Admission Control verifies that the requst is well-formed and/or potentially needs to be modified before proceeding
- Validation
  - This stage ensures that the request is valid.
You can determine the endpoint for the kubeapi server by running: kubectl cluster-info

KubeAPI is also exposed via a service named ‘kubernetes’ in the default namespace

kubeadmin@kube-controlplane:~$ k get svc kubernetes -n default -o yaml
  apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2024-11-11T10:57:42Z"
    labels:
      component: apiserver
      provider: kubernetes
    name: kubernetes
    namespace: default
    resourceVersion: "234"
    uid: 768d1a22-91ff-4ab3-8cd7-b86340fc319a
  spec:
    clusterIP: 10.96.0.1
    clusterIPs:
    - 10.96.0.1
    internalTrafficPolicy: Cluster
    ipFamilies:
    - IPv4
    ipFamilyPolicy: SingleStack
    ports:
    - name: https
      port: 443
      protocol: TCP
      targetPort: 6443
    sessionAffinity: None
    type: ClusterIP
  status:
    loadBalancer: {}

The endpoint of the kube-api server is also exposed to pods via environment variables:

kubeadmin@kube-controlplane:~$ k exec -it other -- /bin/sh -c 'env | grep -i kube'
 KUBERNETES_SERVICE_PORT=443
 KUBERNETES_PORT=tcp://10.96.0.1:443
 KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1
 KUBERNETES_PORT_443_TCP_PORT=443
 KUBERNETES_PORT_443_TCP_PROTO=tcp
 KUBERNETES_SERVICE_PORT_HTTPS=443
 KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443
 KUBERNETES_SERVICE_HOST=10.96.0.1
    ```

Authentication

There are two types of accounts that would need access to a cluster: Humans and Machines. There is no such thing as a ‘user account’ primitive in Kubernetes.

User accounts

Developers, cluster admins, etc.

Service Accounts

Service Accounts are created and managed by the Kubernetes API and can be used for machine authentication
To create a service account: kubectl create serviceaccount <account name>
Service accounts are namespaced
When a service account is created, it has a token created automatically. The token is stored as a secret object.
You can also use the base64 encoded token to communicate with the Kube API Server: curl https://172.16.0.1:6443/api -insecure --header "Authorization: Bearer <token value>"
You can grant service accounts permission to the cluster itself by binding it to a role with a rolebinding. If a pod needs access to the cluster where it is hosted, you you configure the automountServiceAccountToken boolean parameter on the pod and assign it a service account that has the appropriate permissions to the cluster. The token will be mounted to the pods file system, where the value can then be accessed by the pod. The secret is mounted at /var/run/secrets/kubernetes.io/serviceaccount/token.
A service account named ‘default’ is automatically created in every namespace
As of kubernetes 1.22, tokens are automatically mounted to pods by an admission controller as a projected volume.
- https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/1205-bound-service-account-tokens/README.md
As of Kubernetes 1.24, when you create a service account, a secret is no longer created automatically for the token. Now you must run kubectl create token <service account name> to create the token.
- https://github.com/kubernetes/enhancements/issues/2799
One can also manually create a token for a service account:

kubectl create token <service-account-name> --duration=100h

TLS Certificates

Server certificates are used to communicate with clients
Client certificates are used to communicate with servers
Server components used in Kubernetes and their certificates:
- kube-api server: apiserver.crt, apiserver.key
- etcd-server: etcdserver.crt, etcdserver.key
- kubelet: kubelet.crt, kubelet.key
Client components used in kubernetes and their certificates:
- user certificates
- kube-scheduler: scheduler.crt, scheduler.key
- kube-controller-manager: controller-manager.crt, controller-manager.key
- kube-proxy: kubeproxy.crt, kubeproxy.key
To generate a self-signed certificate: openssl req -nodes -x509 -keyout my.key -out my.crt --subj="/CN=mysite.com"
To generate certificates, you can use openssl:
- Create a new private key: openssl genrsa -out my.key 2048
- Create a new certificate signing request: openssl req -new -key my.key -out my.csr -subj "/CN=ryan"
- Sign the csr and generate the certificate or create a signing request with kube-api:
  - Sign and generate: openssl x509 -req -in my.csr -out my.crt
  - Create a CertificateSigningRequest with kube-api:
```
# extract the base64 encoded values of the CSR:
cat my.csr | base64 | tr -d '\n'

# create a CertificateSigningRequest object with kube-api, provide the base64 encoded value
.... see the docs
```
kubeadm will automatically generate certificates for clusters that it creates
- kubeadm generates certificates in the /etc/kubernetes/pki/ directory
To view the details of a certificate, use openssl: openssl x509 -in <path to crt> -text -noout
Once you have a private key, you can sign it using the CertificateSigningRequest object. The controller manager is responsible for signing these requests. You can then use the signed certificate values to authenticate to the Kube API server by placing the signed key, certificate, and ca in a kube config file (~/.kube/config)

kubelet Security

By default, requests to the kubelet API are not authenticated. These requests are bound to an ‘unauthenticated users’ group. This behavior can be changed by setting the --anonymous-auth flag to false in the kubelet config
kubelet ports
- port 10250 on the machine running a kubelet process serves an API that allows full access
- port 10255 on the machine running a kubelet process serves an unauthenticated, read-only API
kubelet supports 2 authentication mechanisms: bearer token and certificated-based authentication
You can find the location of the kubelet config file by looking at the process: ps aux |grep -i kubelet

Authorization

Roles and ClusterRoles

Roles and clusteroles define what a user or service account can do within a cluster
The kubernetes primitive role is namespaced, clusterrole is not

Role Bindings and Cluster Role Bindings

rolebinding and clusterrolebinding link a user or service account to a role

3 System Hardening

Principle of Least Privilege

Ensure that people or bots only have access to what is needed, and nothing else.

Limit access to nodes

Managing Local Users and Groups

Commands to be aware of: id who last groups useradd userdel usermod groupdel
Files to be aware of: /etc/passwd /etc/shadow /etc/group
Disable logins for users and set their login shell to /bin/nologin
Remove users from groups they do not need to belong to

Securing SSH

Set the following in sshd_config

PermitRootLogin no
PasswordAuthentication no

Using sudo

The /etc/sudoers file controls and configures the behavior of the sudo command. Each entry follows a structured syntax. Below is a breakdown of the fields and their meanings:

# Example Lines
# ----------------------------------
# User/Group       Host=Command(s)
admin             ALL=(ALL) NOPASSWD: ALL
%developers       ALL=(ALL) ALL
john              ALL=(ALL:ALL) /usr/bin/apt-get

# Field Breakdown

admin             ALL=(ALL) NOPASSWD: ALL
|                 |   |       |         |
|                 |   |       |         +---> Command(s): Commands the user/group can execute.
|                 |   |       +------------> Options: Modifiers like `NOPASSWD` (no password required).
|                 |   +--------------------> Runas: User/Group the command can be run as.
|                 +------------------------> Host: On which machine this rule applies (`ALL` for any).
+-----------------------------------------> User/Group: The user or group this rule applies to.

# Examples Explained

1. Allow `admin` to execute any command without a password:
   admin ALL=(ALL) NOPASSWD: ALL

Remove Packages Packages

This one is self-explanatory. Don’t have unnecessary software installed on your nodes.

Restrict Kernel Modules

Kernel modules are ways of extending the kernel to enable it to understand new hardware. They are like device drivers.
modprobe allows you to load a kernel module
lsmod allows you to view all loaded modules
You can blacklist modules by adding a new entry to /etc/modprobe.d/blacklist.conf
- The entry should be in the format blacklist <module name>
- Example: echo "blacklist sctp" >> /etc/modprobe.d/blacklist.conf
You may need to reboot the system after disabling kernel modules or blacklisting them

Disable Open Ports

Use netstat -tunlp or to list listening ports on a system
Stop the service associated with the open port or disable access with a firewall
- Common firewalls you can use are iptables or ufw
  - Run ufw status to list the current status of the UFW firewall
  - Allow all traffic outbound: ufw default allow outgoing
  - Deny all incoming: ufw default deny incoming
  - Allow SSH from 172.16.154.24: ufw allow from 172.16.154.24 to any port 22 proto tcp

Tracing Syscalls

There are several ways to trace syscalls in Linux.

strace

strace is included with most Linux distributions.
To use strace, simply add it before the binary that you are running:
```
strace touch /tmp/test
```
You can also attach strace to a running process like this:
```
strace -p <PID>
```

AquaSec Tracee

tracee is an open source tool created by AquaSec
Uses eBPF (extended Berkely Packet Filter) to trace syscalls on a system. eBPF runs programs directly within the kernel space without loading any kernel modules. As a result, tools that use eBPF are more efficient and typically use less resources.
tracee can be run by using the binaries or as a container

Restricting Access to syscalls with seccomp

seccomp can be used to restrict a process’ access to syscalls. It allows access to the most commonly used syscalls, while restricting access to syscalls that can be considered dangerous.

To see if seccomp is enabled:

grep -i seccomp /boot/config-$(uname -r)

seccomp can operate in 1 of 3 modes:
- mode 0: disabled
- mode 1: strict (blocks nearly all syscalls, except for 4)
- mode 2: selectively filters syscalls
- To see which mode the process is currently running in: grep -i seccomp /proc/1/status where ‘1’ is the PID of the process
seccomp profiles
- Kubernetes provides a default seccomp profile, that can be either restrictive or permissive, depending on your configuration
- You can create custom profiles to fine-tune seccomp and which syscalls it blocks or allows within a containers
- Example seccomp profile for mode 1:
```
{
  "defaultAction": "SCMP_ACT_ERRNO",
  "archMap": [
    { "architecture": "SCMP_ARCH_X86_64", "subArchitectures": [] }
  ],
  "syscalls": [
    {
      "names": ["read", "write", "exit", "sigreturn"],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}
```

To apply a seccomp profile to a pod:

apiVersion: v1
kind: Pod
metadata:
  name: audit-pod
  labels:
    app: audit-pod
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: profiles/audit.json #this path is relative to default seccomp profile location (/var/lib/kubelet/seccomp)
  containers:
  - name: test-container
    image: hashicorp/http-echo:1.0
    args:
    - "-text=just made some syscalls!"
    securityContext:
      allowPrivilegeEscalation: false

Restrict access to file systems

AppArmor

AppArmor can be used to limit a containers’ access to resources on the host. Why do we need apparmor if we have traditional discretionary access controls (file system permissions, etc.)? With discretionary access control, a running process will inherit the permissions of the user who started it. Likely more permissions than the process needs. AppArmor is a mandatory access control implementation that allows one to implement fine-grained controls over what a process can access or do on a system.
AppArmor runs as a daemon on Linux systems. You can check it’s status using systemctl: systemctl status apparmor
- If apparmor-utils is installed, you can also use aa-status
To use AppArmor, the kernel module must also be loaded. The check status: cat /sys/module/apparmor/parameters/enabled Y = loaded
AppArmor profiles define what a process can and cannot do and are stored in /etc/apparmor.d/. Profiles need to be copied to every worker node and loaded.
Every profile needs to be loaded into AppArmor before it can take effect
- To view loaded profiles, run aa-status
To load a profile: apparmor_parser -r -W /path/to/profile
- If apparmor-utils is installed, you can also use aa-enforce to load a profile
Profiles are loaded in ‘enforce’ mode by default. To change the mode to ‘complain’: apparmor_parser -C /path/to/profile
- If apparmor-utils is installed, you can also use aa-complain to change the mode

To view loaded apparmor profiles:

  kubeadmin@kube-controlplane:~$ sudo cat /sys/kernel/security/apparmor/profiles
  cri-containerd.apparmor.d (enforce)
  wpcom (unconfined)
  wike (unconfined)
  vpnns (unconfined)
  vivaldi-bin (unconfined)
  virtiofsd (unconfined)
  rsyslogd (enforce)
  vdens (unconfined)
  uwsgi-core (unconfined)
  /usr/sbin/chronyd (enforce)
  /usr/lib/snapd/snap-confine (enforce)
  /usr/lib/snapd/snap-confine//mount-namespace-capture-helper (enforce)
  tcpdump (enforce)
  man_groff (enforce)
  man_filter (enforce)
  ....

or:

  root@controlplane00:/etc/apparmor.d# aa-status
  apparmor module is loaded.
  33 profiles are loaded.
  12 profiles are in enforce mode.
     /home/rtn/tools/test.sh
     /usr/bin/man
     /usr/lib/NetworkManager/nm-dhcp-client.action
     /usr/lib/NetworkManager/nm-dhcp-helper
     /usr/lib/connman/scripts/dhclient-script
     /usr/sbin/chronyd
     /{,usr/}sbin/dhclient
     lsb_release
     man_filter
     man_groff
     nvidia_modprobe
     nvidia_modprobe//kmod
  21 profiles are in complain mode.
     avahi-daemon
     dnsmasq
     dnsmasq//libvirt_leaseshelper
     identd
     klogd
     mdnsd
     nmbd
     nscd
     php-fpm
     ping
     samba-bgqd
     samba-dcerpcd
     samba-rpcd
     samba-rpcd-classic
     samba-rpcd-spoolss
     smbd
     smbldap-useradd
     smbldap-useradd///etc/init.d/nscd
     syslog-ng
     syslogd
     traceroute
  0 profiles are in kill mode.
  0 profiles are in unconfined mode.
  4 processes have profiles defined.
  2 processes are in enforce mode.
     /usr/sbin/chronyd (704)
     /usr/sbin/chronyd (708)
  2 processes are in complain mode.
     /usr/sbin/avahi-daemon (587) avahi-daemon
     /usr/sbin/avahi-daemon (613) avahi-daemon
  0 processes are unconfined but have a profile defined.
  0 processes are in mixed mode.
  0 processes are in kill mode.

AppArmor defines profile modes that determine how the profile behaves:
- Modes:
  - Enforced: Action is taken and the application is allowed/blocked from performing defined actions. Events are logged in syslog.
  - Complain: Events are logged but no action is taken
  - Unconfined: application can perform any task and no event is logged
AppArmor Tools
- Can be used to generate apparmor profiles
- To install: apt install -y apparmor-utils
- Run aa-genprof to generate a profile: aa-genprof ./my-application
Before applying an AppArmor profile to a pod, you must ensure the container runtime supports AppArmor. You must also ensure AppArmor is installed on the worker node and that all necessary profiles are loaded.
To apply an AppArmor profile to a pod, you must add the following security profile (K8s 1.30+):
```
securityContext:
  appArmorProfile:
    type: <profile_type>
    localhostProfile: <profile_name>
```
- <profile_type> can be one of 3 values: Unconfined, RuntimeDefault, or Localhost
  - Unconfined means the container is not restricted by AppArmor
  - RuntimeDefault means the container will use the default AppArmor profile
  - Localhost means the container will use a custom profile

Deep Dive into AppArmor Profiles

AppArmor profiles define security rules for specific applications, specifying what they can and cannot do. These profiles reside in /etc/apparmor.d/ and are loaded into the kernel to enforce security policies.

Each profile follows these structure:

profile <profile_name> <executable_path> {
    <rules>
}

Example profile, a profile for nano:

profile nano /usr/bin/nano {
    # Allow reading any file
    file,

    # Deny writing to system directories
    deny /etc/* rw,
}

Types of AppArmor rules:

File Access Rules:

    /home/user/data.txt r       # Read-only access
    /etc/passwd rw              # Read & write access
    /tmp/ rw                    # Full access to /tmp

Network Access Rules:

    network inet tcp,           # Allow TCP connections
    network inet udp,           # Allow UDP connections
    network inet dgram,         # Allow datagram connections

Capability Rules:

deny capability sys_admin,       # Deny sys_admin capability
deny capability sys_ptrace,      # Deny sys_ptrace capability

Linux Capabilities in Pods

For the purpose of performing permission checks, traditional UNIX implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is nonzero). Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process’s credentials (usually: effective UID, effective GID, and supplementary group list).
Starting with Linux 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute.
Capabilities control what a process can do
Some common capabilities
- CAP_SYS_ADMIN
- CAP_NET_ADMIN
- CAP_NET_RAW
To view the capabilities of a process:
- getcap - Check the capabilities of a binary - getcap <path to bin>
- getpcaps - Check the capabilities of a process - getpcaps <pid>

4 Minimize Microservice Vulnerabilities

Pod Security Admission

Replaced Pod Security Policies
Pod Security Admission controller enforces pod security standards on pods
All you need to do to opt into the PSA feature is to add a label with a specific format to a namespace. All pods in that namespace will have to follow the standards declared.
- The label consists of three parts: a prefix, a mode, and a level
- Example: pod-security.kubernetes.io/restricted=privileged
- Prefix: pod-security.kubernetes.io
- Mode: enforce, audit, or warn
  - Enforce: blocks pods that do not meet the PSS
  - Audit: logs violations to the audit log but does not block pod creation
  - Warn: logs violations on the console but does not block pod created
- Level: privileged, baseline, or restricted
  - Privileged: fully unrestricted
    - Allowed: everything
  - Baseline: some restrictions
    - Allowed: most things, except sharing host namespaces, hostPath volumes and hostPorts, and privileged pods
  - Restricted: most restrictions
    - Allowed: very few things, like running as root, using host networking, hostPath volumes, hostPorts, and privileged pods. The pod must be configured with a seccomp profile.

Security Contexts

Security contexts are used to control the security settings of a pod or container
Security contexts can be defined at the pod level or the container level. Settings defined at the container level will override identical settings defined at the pod level
Security contexts can be used to:
- Run a pod as a specific user
- Run a pod as a specific group
- Run a pod with specific Linux capabilities
- Run a pod with a read-only root filesystem
- Run a pod with a specific SELinux context
- Run a pod with a specific AppArmor profile

You can view the capabilities of a process by viewing the status file of the process and grepping for capabilities:

rtn@worker02:~$ cat /proc/self/status |grep -i cap
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 000001ffffffffff
CapAmb: 0000000000000000

These values are encoded in hexadecimal. To decode them, use the capsh command: rtn@worker02:~$ sudo capsh --decode=000001ffffffffff 0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore

Admission Controllers

Admission Controllers are used for automation within a cluster
Once a request to the KubeAPI server has been authenticated and then authorized, it is intercepted and handled by any applicable Admission Controllers
Example Admission Controllers:
- ImagePolicyWebook
  - You may see this one on the exam.
  - When enabled, the ImagePolicyWebook admission controller contacts an external service (that you or someone else wrote in whatever language you want, it just needs to accept and respond to HTTP requests).
  - To enable, add ‘ImagePolicyWebook’ to the ‘–enable-admission-plugins’ flag of the kube-api server
  - You must also supply an AdmissionControlFileFile file, which is a kubeconfig formatted file. Then pass the path to this config to the kube-api server with the --admission-control-config-file=<path to config file>. Note that this path is the path inside the kube-api container, so you must mount this path on the host to the pod as a hostPath mount.
- AlwaysPullImages
- DefaultStorageClass
- EventRateLimit
- NamespaceExists
- … and many more
Admission Controllers help make Kubernetes modular
To see which Admission Controllers are enabled:
- you can either grep the kubeapi process: ps aux |grep -i kube-api | grep -i admission
- or you can look at the manifest for the KubeAPI server (if the cluster was provisioned with KubeADM) grep admission -A10 /etc/kubernetes/manifests/kube-apiserver.yaml
- or if the cluster was provisioned manually you can look at the systemd unit file for the kube-api server daemon
There are two types of admission controllers:
- Mutating - can make changes to ‘autocorrect’
- Validating - only validates configuration
- Mutating are invoked first. Validating second.
The admission controller runs as a webhook server. It can run inside the cluster as a pod or outside the cluster on another server.
Some admission-controllers required a configuration file to be passed to the kube-api server. This file is passed using the --admission-control-config-file flag.

Open Policy Agent

OPA can be used for authorization. However, it is more likely to be used in the admission control phase.
OPA can be deployed as a daemonset on a node or as a pod
OPA policies use a language called rego

OPA in Kubernetes

GateKeeper

Gatekeeper Constraint Framework
- Gatekeeper is a validating and mutating webhook that enforces CRD-based policies executed by Open Policy Agent, a policy engine for Cloud Native environments hosted by CNCF as a graduated project.
- The framework that helps us implement what, where, and how we want to do something in Kubernetes
  - Example:
    - What: Add labels, etc.
    - Where: kube-system namespace
    - How: When a pod is created
To run Gatekeeper in Kubernetes, simply apply the manifests provided by OPA
The pods and other resources are created in the gatekeeper-system namespace

Constraint Templates

Before you can define a constraint, you must first define a ConstraintTemplate, which describes both the Rego that enforces the constraint and the schema of the constraint. The schema of the constraint allows an admin to fine-tune the behavior of a constraint, much like arguments to a function.

Here is an example constraint template that requires all labels described by the constraint to be present:

```
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: k8srequiredlabels
spec:
  crd:
    spec:
      names:
        kind: K8sRequiredLabels
      validation:
        # Schema for the `parameters` field
        openAPIV3Schema:
          type: object
          properties:
            labels:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package k8srequiredlabels

        violation[{"msg": msg, "details": {"missing_labels": missing}}] {
          provided := {label | input.review.object.metadata.labels[label]}
          required := {label | label := input.parameters.labels[_]}
          missing := required - provided
          count(missing) > 0
          msg := sprintf("you must provide labels: %v", [missing])
        }
```

Constraints
- Constraints are then used to inform Gatekeeper that the admin wants a ConstraintTemplate to be enforced, and how. This constraint uses the K8sRequiredLabels constraint template above to make sure the gatekeeper label is defined on all namespaces:
```
  apiVersion: constraints.gatekeeper.sh/v1beta1
  kind: K8sRequiredLabels
  metadata:
    name: ns-must-have-gk
  spec:
    match:
      kinds:
        - apiGroups: [""]
          kinds: ["Namespace"]
    parameters:
      labels: ["gatekeeper"]
```
- The match field supports multiple options: https://open-policy-agent.github.io/gatekeeper/website/docs/howto#the-match-field
After creating the constraint from the constrainttemplate, you can view all violations by describing the constraint:
- Example: kubectl describe k8srequiredlabels ns-must-have-gk

Kubernetes Secrets

Secrets are used to store sensitive information in Kubernetes
base64 encoded when stored in etcd
Can be injected into a pod as an env or mounted as a volume

Encrypting etcd

By default, the API server stores plain-text representations of resources into etcd, with no at-rest encryption.
The kube-apiserver process accepts an argument –encryption-provider-config that specifies a path to a configuration file. The contents of that file, if you specify one, control how Kubernetes API data is encrypted in etcd.
If you are running the kube-apiserver without the –encryption-provider-config command line argument, you do not have encryption at rest enabled. If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies the identity provider as the first encryption provider in the list, then you do not have at-rest encryption enabled (the default identity provider does not provide any confidentiality protection.)
If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies a provider other than identity as the first encryption provider in the list, then you already have at-rest encryption enabled. However, that check does not tell you whether a previous migration to encrypted storage has succeeded.

Example EncryptionConfiguration:

  apiVersion: apiserver.config.k8s.io/v1
  kind: EncryptionConfiguration
  resources:
    - resources:
        - secrets
        - configmaps
        - pandas.awesome.bears.example # a custom resource API
      providers:
        # This configuration does not provide data confidentiality. The first
        # configured provider is specifying the "identity" mechanism, which
        # stores resources as plain text.
        #
        - identity: {} # plain text, in other words NO encryption
        - aesgcm:
            keys:
              - name: key1
                secret: c2VjcmV0IGlzIHNlY3VyZQ==
              - name: key2
                secret: dGhpcyBpcyBwYXNzd29yZA==
        - aescbc:
            keys:
              - name: key1
                secret: c2VjcmV0IGlzIHNlY3VyZQ==
              - name: key2
                secret: dGhpcyBpcyBwYXNzd29yZA==
        - secretbox:
            keys:
              - name: key1
                secret: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY=
    - resources:
        - events
      providers:
        - identity: {} # do not encrypt Events even though *.* is specified below
    - resources:
        - '*.apps' # wildcard match requires Kubernetes 1.27 or later
      providers:
        - aescbc:
            keys:
            - name: key2
              secret: c2VjcmV0IGlzIHNlY3VyZSwgb3IgaXMgaXQ/Cg==
    - resources:
        - '*.*' # wildcard match requires Kubernetes 1.27 or later
      providers:
        - aescbc:
            keys:
            - name: key3
              secret: c2VjcmV0IGlzIHNlY3VyZSwgSSB0aGluaw==

Each resources array item is a separate config and contains a complete configuration. The resources.resources field is an array of Kubernetes resource names (resource or resource.group) that should be encrypted like Secrets, ConfigMaps, or other resources.
https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/
After enabling encryption in etcd, any secrets that you created prior to enabling encryption will not be encrypted. You can encrypt them by running:

kubectl get secrets -A -o yaml | kubectl replace -f -

Example of getting a secret in etcd:

ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/apiserver-etcd-client.crt --key=/etc/kubernetes/pki/apiserver-etcd-client.key get /registry/secrets/three/con1

The path to the resource in the etcd database is ‘/registry///’

Container Sandboxing

Containers are not contained!
A container sandbox is a mechanism that provides an additional layer of isolation between the container and the host
Container sandboxing is implemented via Runtime Class objects in Kubernetes.
The default container runtime is runc. However, we can change this to use runsc (gvisor) or Kata
Sandboxing prevents the dirty cow exploit, which allows a user to gain root access to the host
- Dirty COW works by exploiting a race condition in the Linux kernel

gVisor

gVisor is a kernel written in Golang that intercepts system calls made by a container
gVisor is like a ‘syscall proxy’ that sits between the container and the kernel
- components
  - sentry -
  - gofer -
Not all apps will work with gVisor
gVisor will cause performance degradation in your app due to the additional time taken
gVisor uses runsc as the runtime handler

Kata Containers

Kata inserts each container into it’s own virtual machine, giving each it’s own kernel
Kata containers require nested virtualisation support, so it may not work with all cloud providers

RuntimeClass

RuntimeClass is a new feature in Kubernetes that allows you to specify which runtime to use for a pod

To use a runtime class

Create a new runtimeclass object:

apiVersion: node.k8s.io/v1
handler: runsc
kind: RuntimeClass
metadata:
  name: secure-runtime

Specify the runtimeClassName in the pod definition:

apiVersion: v1
kind: Pod
metadata:
    name: simple-webapp-1
    labels:
        name: simple-webapp
spec:
    runtimeClassName: secure-runtime
    containers:
    - name: simple-webapp
      image: kodekloud/webapp-delayed-start
      ports:
      - containerPort: 8080

Resource Quotas

Control requests and limits for CPU and memory within a namespace

apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-a-resource-quota
  namespace: team-a
spec:
  hard:
    pods: "5"
    requests.cpu: "0.5"
    requests.memory: 500Mi
    limits.cpu: "1"
    limits.memory: 1Gi

apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-medium
spec:
    hard:
      cpu: "10"
      memory: 20Gi
      pods: "10"
scopeSelector:
  matchExpressions:
  - operator : In
    scopeName: PriorityClass
    values: ["medium"]

API Priority and Fairness

https://kubernetes.io/docs/concepts/cluster-administration/flow-control/
With API Priority and Fairness, you can define which resources need to be prioritized over others in regards to requests to the KubeAPI server

To configure API Priority and Fairness, you create a PriorityLevelConfiguration object:

  ? Is this still supported? Is it an exam topic? I cannot find the manifest spec.

Pod Priority and Preemption

With Pod Priority and Preemption, you can ensure that critical pods are running while the cluster is under resource contention by killing lower priority pods

To implement Pod Priority and Preemption:

Create a priorityClass object (or several):

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for XYZ service pods only."
---
    apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: low-priority
value: 100
globalDefault: false
description: "This priority class should be used for XYZ service pods only."

Assign the priorityClass to a pod:

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  priorityClassName: high-priority

Pod to Pod Encryption

mTLS can be used to encrypt traffic between pods
Methods of p2p encryption
- Service Mesh
  - Service Mesh can offload the encryption and decryption of traffic between pods by using a sidecar proxy
  - Examples:
    - Istio
      - Istio uses Envoy as a sidecar proxy
      - Istio uses a sidecar proxy to encrypt traffic between pods
    - Linkerd
- Wireguard
  - Cilium
    - uses eBPF for network security
    - Encrytion is transparent to the application
    - Provides flexible encryption options
- IPSec
  - Calico

5 Supply Chain Security

SBOM

Supply chain security is the practice of ensuring that the software and hardware that you use in your environment is secure
In the context of the CKS exam, supply chain security refers to the security of the software that you use in your Kubernetes environment

Reduce docker image size

Smaller images are faster to download and deploy
Smaller images are more secure
Smaller images are easier to manage
To reduce the size of a docker image:
- Use a smaller base image
- Use specific package/image versions
- Make file-system read-only
- Don’t run the container as root
- Use multi-stage builds
- Remove unnecessary files
- Use a .dockerignore file to exclude files and directories from the image
- Use COPY instead of ADD
- Use alpine images
- Use scratch images
- Use distroless images

Example of a multi-stage build:

# build container stage 1
  FROM ubuntu
  ARG DEBIAN_FRONTEND=noninteractive
  RUN apt-get update && apt-get install -y golang-go
  COPY app.go .
  RUN CGO_ENABLED=0 go build app.go

# app container stage 2
  FROM alpine:3.12.1 # it is better to use a defined tag, rather than 'latest'
  RUN addgroup -S appgroup && adduser -S appuser -G appgroup -h /home/appuser
  COPY --from=0 /app /home/appuser/app
  USER appuser # run as a non-root user
  CMD ["/home/appuser/app"]

Dockerfile best practices: https://docs.docker.com/build/building/best-practices/
Only certain docker directives create new layers in an image
- FROM
- COPY
- CMD
- RUN
dive and docker-slim are two tools you can use to explore the individual layers that make up an image

Static Analysis

SBOM

A SBOM is a list of all the software that makes up a container image (or an application, etc.)
Formats
- SPDX
  - The standard format for sharing SBOM
  - Available in JSON, RDF, and tag/value formats
  - More complex than CycloneDX due to it’s extensive metadata coverage
  - Comprehensive metadata including license information, origin, and file details
- CycloneDX
  - A lightweight format focused on security and compliance
  - Available in JSON and XML formats
  - Simpler and more focused on essential SBOM elements
  - Focuses on component details, vulnerabilities, and dependencies

Kubesec

Used for static analysis of manifests
https://github.com/controlplaneio/kubesec

Syft

Syft is a powerful and easy-to-use open-source tool for generating Software Bill of Materials (SBOMs) for container images and filesystems. It provides detailed visibility into the packages and dependencies in your software, helping you manage vulnerabilities, license compliance, and software supply chain security.
Syft can export results in SPDX, CycloneDX, JSON, etc.

To scan an image with syft and export the results to a file in SPDX format:

syft scan docker.io/kodekloud/webapp-color:latest -o spdx --file /root/webapp-spdx.sbom

Grype

Grype is a tool (also from Anchore) that can be used to scan SBOM for vulnerabilities

To scan a SBOM with Grype:

grype /root/webapp-sbom.json -o json --file /root/grype-report.json

Kube-linter

Kube-linter can be used to lint Kubernetes manifests and ensure best practices are being followed
kube-linter is configurable. You can disable/enable checks and even create your own custom checks
kube-linter includes recommendations for how to fix failed checks
https://github.com/stackrox/kube-linter

Scanning Images for Vulnerabilities

trivy

trivy can be used to scan images, git repos, and filesystems for vulnerabilities
https://github.com/aquasecurity/trivy

Example:

  sudo docker run --rm  aquasec/trivy:0.17.2 nginx:1.16-alpine

6 Monitoring, Logging, and Runtime Security

falco

Falco is an IDS for Kubernetes workloads
falco is a cloud native security tool. It provides near real-time threat detection for cloud, container, and Kubernetes workloads by leveraging runtime insights. Falco can monitor events defined via customizable rules from various sources, including the Linux kernel, and enrich them with metadata from the Kubernetes API server, container runtime, and more. Falco supports a wide range of kernel versions, x86_64 and ARM64 architectures, and many different output channels.
falco uses sydig filters to extract information about an event. They are configured in the falco rules.yaml or configmap. They can also be passed via helm values.
- /etc/falco/falco.yaml - the main configuration file for falco
- /etc/falco/falco_rules.yaml - the main rules file for falco
falco rule files consist of 3 elements defined in YAML:
- rules - a rule is a condition under which an alert should be generated
- macros - a macro is a reusable rule condition. These help keep the rules file clean and easy to read
- lists - a collection of items that can be used in rules and macros
Some examples of events that falco watches for:
- Reading or writing files at a specific location in the filesystem
- Opening a shell binary for a container, such as /bin/bash
- Sending/receives traffic from undesired URLs
Falco deploys a set of sensors that listen for configured events and conditions
- Each sensor contains a set of rules that map an event to a data source.
- An alert is produced when a rule matches a specific event
- Alerts are then sent to an output channel to record the event

Ensuring Container Immutability

Containers should be immutable. This means that once a container is created, it should not be changed. If changes are needed, a new container should be created.
Containers are mutable (changeable) by default. This can lead to security vulnerabilities.
To ensure container immutability:
- Use a ‘distroless’ container image. These images are minimal and contain only the necessary components to run an application. They do not include a shell.
- Use a ‘read-only’ file system. This prevents changes to the file system. To configure a read-only file system, add the following to the pod spec:
```
spec:
  containers:
  - name: my-container
    image: my-image
    securityContext:
      readOnlyRootFilesystem: true
```

Audit Logs

Auditing involves recording and tracking all events and actions within the cluster
Who made a change, when was it changed, and what exactly was changed
Audit logs provide a chronological record of activities within a cluster
Entries in the audit log exist in ‘JSON Lines’ format. Note that this is not the same as JSON. Each line in the log is a separate JSON object.
Types of Audit Policies:
- None - no logging
- Metadata - Logs request metadata, but not request or response body
- Request - Logs request metadata and request body, but no response body
- Request/Response - Logs the metadata, request body, and response body

Sample Audit Policy

```
apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy
omitStages:
  - "RequestReceived"
rules:
  # Log pod changes at RequestResponse level
  - level: RequestResponse
    resources:
    - group: ""
      # Resource "pods" doesn't match requests to any subresource of pods,
      # which is consistent with the RBAC policy.
      resources: ["pods"]
  # Log "pods/log", "pods/status" at Metadata level
  - level: Metadata
    resources:
    - group: ""
      resources: ["pods/log", "pods/status"]

  # Don't log requests to a configmap called "controller-leader"
  - level: None
    resources:
    - group: ""
      resources: ["configmaps"]
      resourceNames: ["controller-leader"]

  # Don't log watch requests by the "system:kube-proxy" on endpoints or services
  - level: None
    users: ["system:kube-proxy"]
    verbs: ["watch"]
    resources:
    - group: "" # core API group
      resources: ["endpoints", "services"]

  # Don't log authenticated requests to certain non-resource URL paths.
  - level: None
    userGroups: ["system:authenticated"]
    nonResourceURLs:
    - "/api*" # Wildcard matching.
    - "/version"

  # Log the request body of configmap changes in kube-system.
  - level: Request
    resources:
    - group: "" # core API group
      resources: ["configmaps"]
    # This rule only applies to resources in the "kube-system" namespace.
    # The empty string "" can be used to select non-namespaced resources.
    namespaces: ["kube-system"]

  # Log configmap and secret changes in all other namespaces at the Metadata level.
  - level: Metadata
    resources:
    - group: "" # core API group
      resources: ["secrets", "configmaps"]

  # Log all other resources in core and extensions at the Request level.
  - level: Request
    resources:
    - group: "" # core API group
    - group: "extensions" # Version of group should NOT be included.

  # A catch-all rule to log all other requests at the Metadata level.
  - level: Metadata
    # Long-running requests like watches that fall under this rule will not
    # generate an audit event in RequestReceived.
    omitStages:
      - "RequestReceived"
```

Once the audit policy has been defined, you can apply it to the cluster by passing the --audit-policy-file flag to the kube-apiserver
To use a file-based log backend, you need to pass 3 configurations to the kube-apiserver:
- --audit-policy-file - this is the path to the audit policy file
- --audit-log-path - this is the path to the audit log file
- both of these paths needs to be mounted in the kube-apiserver. The kube-apiserver cannot read these files on the node without a proper volumeMount

Keyboard shortcuts

notebook

Certified Kubernetes Security Specialist (CKS) Notes