Certified Kubernetes Security Specialist (CKS) Notes


https://www.cncf.io/certification/cks/
Certified Kubernetes Security Specialist (CKS) Notes
Table of Contents
-
-
-
-
-
-
-
-
-
Exam
Outline
- https://github.com/cncf/curriculum/blob/master/CKS_Curriculum%20v1.31.pdf
Cirriculum
Exam objectives that outline the knowledge, skills, and abilities that a Certified Kubernetes Security Specialist (CKS) can be expected to demonstrate.
Cluster Setup (10%)
-
Use Network security policies to restrict cluster level access
-
Use CIS benchmark to review the security configuration of Kubernetes components (etcd, kubelet, kubedns, kubeapi)
-
Properly set up Ingress objects with security control
-
Protect node metadata and endpoints
-
Kubernetes Documentation > Tasks > Administer a Cluster > Securing a Cluster
# all pods in namespace cannot access metadata endpoint apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: cloud-metadata-deny namespace: default spec: podSelector: {} policyTypes: - Egress egress: - to: - ipBlock: cidr: 0.0.0.0/0 except: - 169.254.169.254/32
-
-
Minimize use of, and access to, GUI elements
-
Verify platform binaries before deploying
-
Kubernetes Documentation > Tasks > Install Tools > Install and Set Up kubectl on Linux
Note: Check the step 2 - validate binary
-
Cluster Hardening (15%)
-
Restrict access to Kubernetes API
-
Use Role Based Access Controls to minimize exposure
-
Exercise caution in using service accounts e.g. disable defaults, minimize permissions on newly created ones
-
Update Kubernetes frequently
System Hardening (15%)
-
Minimize host OS footprint (reduce attack surface)
- Remove unnecessary packages
- Identify and address open ports
- Shut down any unnecessary services
-
Minimize IAM roles
-
Minimize external access to the network
-
Appropriately use kernel hardening tools such as AppArmor, seccomp
Minimize Microservice Vulnerabilities (20%)
-
Setup appropriate OS level security domains e.g. using PSP, OPA, security contexts
-
Manage kubernetes secrets
-
Use container runtime sandboxes in multi-tenant environments (e.g. gvisor, kata containers
-
Implement pod to pod encryption by use of mTLS
Supply Chain Security (20%)
-
Minimize base image footprint
- Remove exploitable and non-sssential software
- Use multi-stage Dockerfiles to keep software compilation out of runtime images
- Never bake any secrets into your images
- Image scanning
-
Secure your supply chain: whitelist allowed image registries, sign and validate images
-
Use static analysis of user workloads (e.g. kubernetes resources, docker files)
- Secure base images
- Remove unnecessary packages
- Stop containers from using elevated privileges
-
Scan images for known vulnerabilities
Monitoring, Logging and Runtime Security (20%)
-
Perform behavioral analytics of syscall process and file activities at the host and container level to detect malicious activities
-
Detect threats within physical infrastructure, apps, networks, data, users and workloads
-
Detect all phases of attack regardless where it occurs and how it spreads
-
Perform deep analytical investigation and identification of bad actors within environment
-
Ensure immutability of containers at runtime
-
readOnlyRootFilesystem: Mounts the container’s root filesystem as read-only
-
Use Audit Logs to monitor access
Changes
- https://kodekloud.com/blog/cks-exam-updates-2024-your-complete-guide-to-certification-with-kodekloud/
- https://training.linuxfoundation.org/cks-program-changes/
Software / Environment
As of 11/2024
- Kubernetes version: 1.31
- Ubuntu 20.04
- Terminal
- Bash
- Tools available
vim- Text/Code editortmux- Terminal multiplexorjq- Working with JSON formatyq- Working with YAML formatfirefox- Web Browser for accessing K8s docsbase64- Tool to convert to and from base 64kubectl- Kubernetes CLI Client- more typical linux tools like
grep,wc…
- 3rd Party Tools to know
traceeOPA Gatekeeperkubebenchsyftgrypekube-linterkubesectrivyfalco
Exam Environment Setup
Terminal Shortcuts/Aliases
The following are useful terminal shortcut aliases/shortcuts to use during the exam.
Add the following to the end of ~/.bashrc file:
alias k='kubectl # <-- Most general and useful shortcut!
alias kd='kubectl delete --force --grace-period=0 # <-- Fast deletion of resources
alias kc="kubectl create" # <-- Create a resource
alias kc-dry='kubectl create --dry-run=client -o yaml # <-- Create a YAML template of resource
alias kr='kubectl run' # <-- Run/Create a resource (typically pod)
alias kr-dry='kubectl run --dry-run=client -o yaml # <-- Create a YAML template of resource
# If kc-dry and kr-dry do not autocomplete, add the following
export do="dry-run=client -o yaml" # <-- Create the YAML tamplate (usage: $do)
The following are some example usages:
k get nodes -o wide
kc deploymentmy my-dep --image=nginx --replicas=3
kr-dry my-pod --image=nginx --command sleep 36000
kr-dry --image=busybox -- "/bin/sh" "-c" "sleep 36000"
kr --image=busybox -- "/bin/sh" "-c" "sleep 36000" $do
Terminal Command Completion
The following is useful so that you can use the TAB key to auto-complete a command, allowing you to not always have to remember the exact keyword or spelling.
Type the following into the terminal:
- kubectl completion bash >> ~/.bashrc`-`kubectl` command completion
- kubeadm completion bash >> ~/.bashrc`-`kubeadm` command completion
- exec $SHELL` - Reload shell to enable all added completion
VIM
The exam will have VIM or nano terminal text editor tools available. If you are using
VIM ensure that you create a ~/.vimrc file and add the following:
set ts=2 " <-- tabstop - how many spaces is \t worth
set sw=2 " <-- shiftwidth - how many spaces is indentation
set et " <-- expandtab - Use spaces, never \t values
set mouse=a " <-- Enable mouse support
Or simply:
set ts=2 sw=2 et mouse=a
Also know VIM basics are as follows. Maybe a good idea to take a quick VIM course.
vim my-file.yaml- If file exists, open it, else create it for editing:w- Save:x- Save and exit:q- Exit:q!- Exit without savingi- Insert mode, regular text editor modev- Visual mode for selectionESC- Normal mode
Pasting Text Into VIM
Often times you will want to paste text or code from the Kubernetes documentation into into a VIM terminal. If you simply do that, the tabs will do funky things.
Do the following inside VIM before pasting your copied text:
- In NORMAL mode, type
:set paste - Now enter
INSERTmode
- You should see –
INSERT (paste) --at the bottom of the screen
- Paste the text
- You can right click with mouse and select Paste or
CTRL + SHIFT + v
tmux
tmux will allow you to use multiple terminal windows in one (aka terminal multiplexing).
Make sure you know the basics for tmux usage:
tmux- Turn and entertmuxCTRL + b "- Split the window vertically (line is horizontal)CTRL + b %- Split the window horizontally (line is vertical)CTRL + b <ARROW KEY>- Switch between window panesCTRL + b (hold) <ARROW KEY>- Resize current window paneCTRL + b z- Toggle full terminal/screen a pane (good for looking at a full document)CTRL + dorexit- Close a window pane
Mouse Support
If you want to be able to click and select within tmux and tmux panes, you can also enable mouse support. This can be useful.
These steps must be done outside of tmux`
-
Create a
.tmux.conffile and edit itvim ~/.tmux.conf
-
Add the configuration, save, and exit file
set -g mouse on
-
Reload tmux configuration
tmux source .tmux.conf
Preparation
Study Resources
- Official Kubernetes Documentation
- KodeKloud CKS Course
- The Kubernetes Book - Nigel Poulton
- CKS Study Guide
- killer.sh labs
Practice
Fundamentals
- You should already have CKA level knowledge
- Linux Kernel Namespaces isolate containers
- PID Namespace: Isolates processes
- Mount Namespace: Restricts access to mounts or root filesystem
- Network Namespace: Only access certain network devices. Firewall and routing rules
- User Namespace: Different set of UIDs are used. Example: User (UID 0) inside one namespace can be different from user(UID 0) inside another namespace
- cgroups restrict resource usage of processes
- RAM/Disk/CPU
- Using cgroups and linux kernel namespaces, we can create containers
Understand the Kubernetes Attack Surface
- Kubernetes is a complex system with many components. Each component has its own vulnerabilities and attack vectors.
- The attack surface can be reduced by:
- Using network policies to restrict traffic between pods
- Using RBAC to restrict access to the kube-api server
- Using admission controllers to enforce security policies
- Using pod security standards to enforce security policies
- Using best practices to secure the underlying infrastructure
- Using securityContext to enforce security policies for pods
The 4 C’s of Cloud-Native Security
- Cloud: Security of the cloud infrastructure
- Cluster: Security of the cluster itself
- Container: Security of the containers themselves
- Code: Security of the code itself
1 Cluster Setup
CIS Benchmark
What is a security benchmark?
- A security benchmark is a set of standard benchmarks that define a state of optimized security for a given system (servers, network devices, etc.)
- CIS (Center for Internet Security) provides standardized benchmarks (in the form of downloadable files) that one can use to implement security on their system.
- CIS provides benchmarks for public clouds (Azure, AWS, GCP, etc.), operating systems (Linux, Windows, MacOS), network devices (Cisco, Juniper, HP, etc.), mobile devices (Android and Apple), desktop and server software (such as Kubernetes)
- View more info here
- You must register at the CIS website to download benchmarks
- Each benchmark provides a description of a vulnerability, as well as a path to resolution.
- CIS-CAT is a tool you can run on a system to generate recommendations for a given system. There are two versions available for download, CIS-CAT Lite and CIS-CAT Pro. The Lite version only includes benchmarks for Windows 10, MacOS, Ubuntu, and desktop software (Google Chrome, etc.). The Pro version includes all benchmarks.
- CIS Benchmarks for Kubernetes
- Register at the CIS website and download the CIS Benchmarks for kubernetes
- Includes security benchmarks for master and worker nodes
KubeBench
- KubeBench is an alternative to CIS-CAT Pro to run benchmarks against a Kubernetes cluster.
- KubeBench is open source and maintained by Aqua Security
- KubeBench can be deployed as a Docker container or a pod. It can also be invoked directly from the binaries or compiled from source.
- Once run, kube-bench will scan the cluster to identify if best-practices have been implemented. If will output a report specifying which benchmarks have passed/failed. It will tell you how to fix any failed benchmarks.
- You can view the report by tailing the pod logs of the kube-bench pod.
Cluster Upgrades
- The controller-manager and kube-scheduler can be one minor revision behind the API server.
- For example, if the API server is at version 1.10, controller-manager and kube-scheduler can be at 1.9 or 1.10
- The kubelet and kube-proxy can be up to 2 minor revisions behind the API server
- kubectl can be x+1 or x-1 minor revisions from the kube API server
- You can upgrade the cluster one minor version at a time
Upgrade Process
- Drain and cordon the node before upgrading it
kubectl drain <node name> --ignore-daemonsets
- Upgrade the master node first.
- Upgrade worker nodes after the master node.
Upgrading with Kubeadm
- If the cluster was created with kubeadm, you can use kubeadm to upgrade it.
- The upgrade process with kubeadm:
# Increase the minor version in the apt repository file for kubernetes: sudo vi /etc/apt/sources.list.d/kubernetes.list # Determine which version to upgrade to sudo apt update sudo apt-cache madison kubeadm # Upgrade kubeadm first sudo apt-mark unhold kubeadm && \ sudo apt-get update && sudo apt-get install -y kubeadm='1.31.x-*' && \ sudo apt-mark hold kubeadm # Verify the version of kubeadm kubeadm version # Check the kubeadm upgrade plan sudo kubeadm upgrade plan # Apply the upgrade plan sudo kubeadm upgrade apply v1.31.x # Upgrade the nodes sudo kubeadm upgrade node # Upgrade kubelet and kubectl sudo apt-mark unhold kubelet kubectl && \ sudo apt-get update && sudo apt-get install -y kubelet='1.31.x-*' kubectl='1.31.x-*' && \ sudo apt-mark hold kubelet kubectl # Restart the kubelet sudo systemctl daemon-reload sudo systemctl restart kubelet
Network Policies
Overview
-
Kubernetes Network Policies allow you to control the flow of traffic to and from pods. They define rules that specify:
- What traffic is allowed to reach a set of pods.
- What traffic a set of pods can send out.
-
Pods can communicate with each other by default. Network Policies allow you to restrict this communication.
-
Network Policies operate at Layer 3 and Layer 4 (IP and TCP/UDP). They do not cover Layer 7 (application layer).
-
Network Policies are additive. Meaning, to grant more permissions for network communication, simply create another network policy with more fine-grained rules.
-
Network Policies are implemented by the network plugin. The network plugin must support NetworkPolicy for the policies to take effect.
-
Network Policies are namespace-scoped. They apply to pods in the same namespace.
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: secure-namespace spec: podSelector: {} policyTypes - Ingress -
Say we now want to grant the ‘frontend’ pods with label ‘teir: frontend’ in the ‘app’ namespace access to the ‘backend’ pods in ‘secure-namespace’. We can do that by creating another Network Policy like this:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-app-pods namespace: secure-namespace spec: podSelector: matchLabels: tier: backend policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: name: app podSelector: matchLabels: teir: frontend ports: - protocol: TCP port: 3000
Key Concepts
- Namespace Scope: Network policies are applied at the namespace level.
- Selector-Based Rules:
- Pod Selector: Select pods the policy applies to.
- Namespace Selector: Select pods based on their namespace.
- Traffic Direction:
- Ingress: Traffic coming into the pod.
- Egress: Traffic leaving the pod.
- Default Behavior:
- Pods are non-isolated by default (accept all traffic).
- A pod becomes isolated when a network policy matches it.
Common Fields in a Network Policy
podSelector: Specifies the pods the policy applies to.ingress/egress: Lists rules for ingress or egress traffic.from/to: Specifies allowed sources/destinations (can use IP blocks, pod selectors, or namespace selectors).ports: Specifies allowed ports and protocols.
Example Network Policies
Allow All Ingress Traffic
````
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all-ingress
namespace: default
spec:
podSelector: {}
ingress:
- {}
```
Deny All Ingress and Egress Traffic
````
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-all
namespace: defaulT
spec:
podSelector: {}
ingress: []
egress: []
```
Allow Specific Ingress from a Namespace
```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-namespace-ingress
namespace: default
spec:
podSelector:
matchLabels:
app: my-app
ingress:
- from:
- namespaceSelector:
matchLabels:
team: frontend
```
Allow Egress to a Specific IP
```
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-egress-specific-ip
namespace: default
spec:
podSelector:
matchLabels:
app: my-app
egress:
- to:
- ipBlock:
cidr: 192.168.1.0/24
ports:
- protocol: TCP
port: 8080
```
Cilium Network Policy
- Cilium Network Policies provide more granularity, flexibility, and features than traditional Kubernetes Network Policies
- Cilium Network Policies operate up to layer 7 of the OSI model. Traditional Network Policies only operate up to layer 4.
- Cilium Network Policies perform well due to the fact that they use eBPF
- Hubble allows you to watch traffic going to and from pods
- You can add Cilium to the cluster by:
- Deploying with helm
- Running
cilium installafter you install the cilium CLI tool
Cilium Network Policy Structure
- Cilium Network Policies are defined in YAML files
- The structure is similar to Kubernetes Network Policies
Layer 3 Rules
- Endpoints Based - Apply the policy to pods based on Kubernetes label selectors
- Services Based - Apply the policy based on kubernetes services, controlling traffic based on service names rather than individual pods
- Entities Based - Cilium has pre-defined entities like cluster, host, and world. This type of policy uses these entities to determine what traffic the policy is applied to.
- Cluster - Represents all kubernetes endpoints
- Example:
apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: allow-egress-to-cluster-resources spec: endpointSelector: {} egress: - toEntities: - cluster
- Example:
- World - Represents any external traffic, but not cluster traffic
apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: allow-egress-to-external-resources spec: endpointSelector: {} egress: - toEntities: - world - Host - Represents the local kubernetes node
- Remote-node - Represents traffic from a remote node
- All - Represents all endpoints both internal and external to the cluster
apiVersion: cilium.io/v2 kind: CiliumNetworkPolicy metadata: name: allow-egress-to-external-resources spec: endpointSelector: {} egress: - toEntities: - all
- Cluster - Represents all kubernetes endpoints
- Node Based - Apply the policy based on nodes in the cluster
- IP/CIDR Based - Apply the policy based on IP addresses or CIDR blocks
Layer 4 Rules
- If no layer 4 rules are defined, all traffic is allowed for layer 4
- Example:
apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy metadata: name: allow-external-80 spec: endpointSelector: matchLabels: run: curl egress: - toPorts: - ports: - port: "80" protocol: TCP
Layer 7 Rules
Deny Policies
- You can create deny policies to explicitly block traffic
- Deny policies take higher precedence over allow policies
- ingressDeny Example:
apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy metadata: name: deny-ingress-80-for-backend spec: endpointSelector: matchLabels: app: backend ingressDeny: - fromEntities: - all - toPorts: - ports: - port: "80" protocol: TCP - egressDeny Example:
apiVersion: "cilium.io/v2" kind: CiliumNetworkPolicy metadata: name: "deny-egress" spec: endpointSelector: matchLabels: app: random-pod egress: - toEntities: - all egressDeny: - toEndpoints: - matchLabels: app: server
Examples
Default Deny All
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
name: default-deny-all
spec:
endpointSelector: {}
ingress:
- fromEntities:
- world
Kubernetes Ingress
What is Ingress?
- Ingress is an API object that manages external access to services in a Kubernetes cluster, typically HTTP and HTTPS.
- Provides:
- Load balancing
- SSL termination
- Name-based virtual hosting
Why Use Ingress?
- To consolidate multiple service endpoints behind a single, externally accessible URL.
- Reduce the need for creating individual LoadBalancers or NodePort services.
Key Components of Ingress
-
Ingress Controller
- Software that watches for Ingress resources and implements the rules.
- Popular Ingress controllers:
ingress-nginxTraefikHAProxyIstio Gateway
- Must be installed separately in the cluster.
-
Ingress Resource
- The Kubernetes object that defines how requests should be routed to services.
Ingress Resource Configuration
- As of Kubernetes 1.20, you can create an ingress using kubectl:
kubectl create ingress --rule="host/path=service:port"
Basic Structure
```
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example-ingress
spec:
rules:
- host: example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: example-service
port:
number: 80
```
Ingress with TLS
-
Kubernetes automatically creates a self-signed certificate for HTTPS. To view it, first determine the HTTPS port of the ingress controller service:
kubeadmin@kube-controlplane:~$ k get svc -n ingress-nginx NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE ingress-nginx-controller NodePort 10.103.169.156 <none> 80:31818/TCP,443:30506/TCP 38m ingress-nginx-controller-admission ClusterIP 10.103.26.228 <none> 443/TCP 38m kubeadmin@kube-controlplane:~$ -
The HTTPS port is 30506 in this case. To view the self-signed certificate, we can use curl:
λ notes $ curl https://13.68.211.113:30506/service1 -k -v * (304) (OUT), TLS handshake, Finished (20): } [52 bytes data] * SSL connection using TLSv1.3 / AEAD-AES256-GCM-SHA384 / [blank] / UNDEF * ALPN: server accepted h2 * Server certificate: * subject: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate <<<<<<<<<<<<<<<< * start date: Dec 20 14:23:08 2024 GMT * expire date: Dec 20 14:23:08 2025 GMT * issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate * SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway. * using HTTP/2 * [HTTP/2] [1] OPENED stream for https://13.68.211.113:30506/service1 * [HTTP/2] [1] [:method: GET] * [HTTP/2] [1] [:scheme: https] * [HTTP/2] [1] [:authority: 13.68.211.113:30506] * [HTTP/2] [1] [:path: /service1] * [HTTP/2] [1] [user-agent: Mozilla/5.0 Gecko] * [HTTP/2] [1] [accept: */*] > GET /service1 HTTP/2 > Host: 13.68.211.113:30506 > User-Agent: Mozilla/5.0 Gecko > Accept: */* -
To configure a ingress resource to use TLS (HTTPS), we first need to create a certificate:
# create a new 2048-bit RSA private key and associated cert openssl req -nodes -new -x509 -keyout my.key -out my.crt -subj "/CN=mysite.com" -
Next, create a secret for the tls cert:
kubectl create secret tls mycert --cert=my.crt --key=my.key -n my-namespace -
Create the ingress:
apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: secure-ingress annotations: nginx.ingress.kubernetes.io/ssl-redirect: "true" spec: tls: - hosts: - example.com secretName: mycert rules: - host: example.com http: paths: - path: / pathType: Prefix backend: service: name: secure-service port: number: 80
Annotations
- Extend the functionality of Ingress controllers.
- Common examples (specific to nginx):
- nginx.ingress.kubernetes.io/rewrite-target: Rewrite request paths.
- nginx.ingress.kubernetes.io/ssl-redirect: Force SSL.
- nginx.ingress.kubernetes.io/proxy-body-size: Limit request size.
Protecting Node Metadata and Endpoints
Protecting Endpoints
- Kubernetes clusters expose information on various ports:
| Port Range | Purpose |
| ---------- | ------- |
| 6443 | kube-api |
| 2379 - 2380 | etcd |
| 10250 | kubelet api |
| 10259 | kube-scheduler |
| 10257 | kube-controller-manager |
- Many of these ports are configurable. For example, to change the port that kube-api listens on, just modify `--secure-port` in the kube-api manifest.
- Setup firewall rules to minimize the attack surface
Securing Node Metadata
-
A lot of information can be obtained from node metadata
- Node name
- Node state
- annotations
- System Info
- etc.
-
Why secure node metadata?
- If node metadata is tampered with, pods may be assigned to the wrong nodes, which has security implications to considers
- You can determine the version of kubelet and other kubernetes components from node metadata
- If an attacker can modify node metadata, they could taint all the nodes, making all nodes unscheduleable
-
Protection Strategies
- Use RBAC to control who has access to modify node metadata
- Node isolation using labels and node selectors
- Audit logs to determine who is accessing the cluster and respond accordingly
- Update node operating systems regularly
- Update cluster components regularly
-
Cloud providers such as Amazon and Azure often expose node information via metadata endpoints on the node. These endpoints are important to protect.
-
This endpoint can be accessed at 169.254.169.254 on nodes in both Azure and AWS. An example for Azure:
curl -s -H Metadata:true --noproxy "*" "http://169.254.169.254/metadata/instance?api-version=2021-02-01" | jq -
Node metadata endpoints can be prevented from being accessed by pods by creating network policies.
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-ingress-metadata-server namespace: a12 spec: policyTypes: - Egress podSelector: {} egress: - to: - ipBlock: cidr: 0.0.0.0/0 except: - 169.254.169.254/32
Verify Kubernetes Binaries
- The SHA sum of a file changes if the content within the file is changed
- You can download the binaries from github using wget.
Example:
wget -O /opt/kubernetes.tar.gz https://dl.k8s.io/v1.31.1/kubernetes.tar.gz - To validate that a binary downloaded from the internet has not been modified, check the hash code:
echo $(cat kubectl.sha256) kubectl | sha256sum --check
Securing etcd
- etcd is a distributed key-value store that Kubernetes uses to store configuration data
- etcd by default listens on port 2379/tcp
Play with etcd
Step 1: Create the Base Binaries Directory
```sh
mkdir /root/binaries
cd /root/binaries
```
Step 2: Download and Copy the ETCD Binaries to Path
```sh
wget https://github.com/etcd-io/etcd/releases/download/v3.5.18/etcd-v3.5.18-linux-amd64.tar.gz
tar -xzvf etcd-v3.5.18-linux-amd64.tar.gz
cd /root/binaries/etcd-v3.5.18-linux-amd64/
cp etcd etcdctl /usr/local/bin/
```
Step 3: Start etcd
```sh
cd /tmp
etcd
```
Step 4: Verification - Store and Fetch Data from etcd
```sh
etcdctl put key1 "value1"
```
```sh
etcdctl get key1
```
Encrypting data in transit in etcd
- etcd supports TLS encryption for data in transit
- By default, etcd packaged with kubeadm is configured to use TLS encryption
- One can capture packets from etcd using tcpdump:
root@controlplane00:/var/lib/etcd/member# tcpdump -i lo -X port 2379 tcpdump: verbose output suppressed, use -v[v]... for full protocol decode listening on lo, link-type EN10MB (Ethernet), snapshot length 262144 bytes 16:10:01.691453 IP localhost.2379 > localhost.42040: Flags [P.], seq 235868994:235869033, ack 3277609642, win 640, options [nop,nop,TS val 1280288044 ecr 1280288042], length 39 0x0000: 4500 005b 35e4 4000 4006 06b7 7f00 0001 E..[5.@.@....... 0x0010: 7f00 0001 094b a438 0e0f 1342 c35c 5aaa .....K.8...B.\Z. 0x0020: 8018 0280 fe4f 0000 0101 080a 4c4f a52c .....O......LO., 0x0030: 4c4f a52a 1703 0300 2289 00d8 5dcc 7b88 LO.*...."...].{. 0x0040: 6f7a 290f 536b 0fd0 f7d9 1fb4 f83f 4aab oz).Sk.......?J. 0x0050: a6e7 0af8 0835 e597 a93d 4d .....5...=M 16:10:01.691479 IP localhost.42040 > localhost.2379: Flags [.], ack 39, win 14819, options [nop,nop,TS val 1280288044 ecr 1280288044], length 0 0x0000: 4500 0034 7174 4000 4006 cb4d 7f00 0001 E..4qt@.@..M.... 0x0010: 7f00 0001 a438 094b c35c 5aaa 0e0f 1369 .....8.K.\Z....i 0x0020: 8010 39e3 fe28 0000 0101 080a 4c4f a52c ..9..(......LO., 0x0030: 4c4f a52c LO., 16:10:01.691611 IP localhost.2379 > localhost.42040: Flags [P.], seq 39:1222, ack 1, win 640, options [nop,nop,TS val 1280288044 ecr 1280288044], length 1183 0x0000: 4500 04d3 35e5 4000 4006 023e 7f00 0001 E...5.@.@..>.... 0x0010: 7f00 0001 094b a438 0e0f 1369 c35c 5aaa .....K.8...i.\Z. 0x0020: 8018 0280 02c8 0000 0101 080a 4c4f a52c ............LO., 0x0030: 4c4f a52c 1703 0304 9ac0 c579 d4ed 808c LO.,.......y.... ..... redacted - The traffic captured in the output above is encrypted.
Encrypting data at rest in etcd
-
By default, the API server stores plain-text representations of resources into etcd, with no at-rest encryption.
-
etcd stores data in the
/var/lib/etcd/memberdirectory. When the database is not encrypted, one can easily grep the contents of this directory, looking for secrets:root@controlplane00:/var/lib/etcd/member# ls -lisa total 16 639000 4 drwx------ 4 root root 4096 Mar 21 10:53 . 385187 4 drwx------ 3 root root 4096 Mar 21 10:52 .. 639002 4 drwx------ 2 root root 4096 Mar 21 14:43 snap 638820 4 drwx------ 2 root root 4096 Mar 21 11:59 wal root@controlplane00:/var/lib/etcd/member# grep -R test-secret . grep: ./wal/00000000000000ac-0000000000a9340b.wal: binary file matches grep: ./wal/00000000000000a8-0000000000a721c1.wal: binary file matches grep: ./wal/00000000000000aa-0000000000a83f1e.wal: binary file matches grep: ./wal/00000000000000a9-0000000000a7b97e.wal: binary file matches grep: ./wal/00000000000000ab-0000000000a8d8a7.wal: binary file matches grep: ./snap/db: binary file matches -
The kube-apiserver process accepts an argument –encryption-provider-config that specifies a path to a configuration file. The contents of that file, if you specify one, control how Kubernetes API data is encrypted in etcd.
-
If you are running the kube-apiserver without the –encryption-provider-config command line argument, you do not have encryption at rest enabled. If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies the
identityprovider as the first encryption provider in the list, then you do not have at-rest encryption enabled (the default identity provider does not provide any confidentiality protection.) -
If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies a provider other than
identityas the first encryption provider in the list, then you already have at-rest encryption enabled. However, that check does not tell you whether a previous migration to encrypted storage has succeeded. -
Example EncryptionConfiguration:
apiVersion: apiserver.config.k8s.io/v1 kind: EncryptionConfiguration resources: - resources: - secrets - configmaps - pandas.awesome.bears.example # a custom resource API providers: # This configuration does not provide data confidentiality. The first # configured provider is specifying the "identity" mechanism, which # stores resources as plain text. # - identity: {} # plain text, in other words NO encryption - aesgcm: keys: - name: key1 secret: c2VjcmV0IGlzIHNlY3VyZQ== - name: key2 secret: dGhpcyBpcyBwYXNzd29yZA== - aescbc: keys: - name: key1 secret: c2VjcmV0IGlzIHNlY3VyZQ== - name: key2 secret: dGhpcyBpcyBwYXNzd29yZA== - secretbox: keys: - name: key1 secret: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY= - resources: - events providers: - identity: {} # do not encrypt Events even though *.* is specified below - resources: - '*.apps' # wildcard match requires Kubernetes 1.27 or later providers: - aescbc: keys: - name: key2 secret: c2VjcmV0IGlzIHNlY3VyZSwgb3IgaXMgaXQ/Cg== - resources: - '*.*' # wildcard match requires Kubernetes 1.27 or later providers: - aescbc: keys: - name: key3 secret: c2VjcmV0IGlzIHNlY3VyZSwgSSB0aGluaw== -
Each resources array item is a separate config and contains a complete configuration. The resources.resources field is an array of Kubernetes resource names (resource or resource.group) that should be encrypted like Secrets, ConfigMaps, or other resources.
-
https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/
-
After enabling encryption in etcd, any resources that you created prior to enabling encryption will not be encrypted. For example, you can encrypt secrets by running:
kubectl get secrets -A -o yaml | kubectl replace -f -
- Example of getting a secret in etcd:
root@controlplane00:/etc/kubernetes/pki# ETCDCTL_API=3 etcdctl --cacert=./etcd/ca.crt --cert=./apiserver-etcd-client.crt --key=./apiserver-etcd-client.key get /registry/secrets/default/mysecret
/registry/secrets/default/mysecret
k8s:enc:aescbc:v1:key1:ܨt>;8ܑ%TUIodEs*lsHGwjeF8S!Aqaj\Pq;9Ⱥ7dJe{B2=|p4#'BuCxUY,*IuFM
wxx@
2Q0e5UzH^^)rX_H%GUɈ-XqC.˽pC `kBW>K12 n
The path to the resource in the etcd database is ‘/registry/
Securing kube-apiserver
- Kube-apiserver acts as the gateway for all resources in kubernetes. Kube-apiserver is the only component in kubernetes that communicates with etcd
- kube-apiserver authenticates to etcd using TLS client certificates.
- Kube-apiserver should encrypt data before it is stored in etcd
- kube-apiserver should only listen on an HTTPS endpoint. There was an option to host kube-apiserver on an HTTP endpoint, but this option has been deprecated as of 1.10 and removed in 1.22
- kube-apiserver should have auditing enabled
Authentication
- One can authentication to the KubeAPI server using certificates or a kubeconfig file
Access Controls
- After a request is authenticated, it is authorized. Authorization is the process of determining what actions a user can perform.
- Multiple authorization modules are supported:
- AlwaysAllow - Allows all requests
- AlwaysDeny - Blocks all requests
- RBAC - Role-based access control for requests. This is the default authorization module in kubernetes
- Node - Authorizes kubelets to access the kube-api server
2 Cluster Hardening
Securing Access to the KubeAPI Server
- A request to the KubeAPI server goes through 4 stages before it is processed by KubeAPI:
- Authentication
- Validates the identity of the caller by inspecting client certificates or tokens
- Authorization
- The authorization stage verifies that the identity found in the first stage can access the verb and resource in the request
- Admission Controllers
- Admission Control verifies that the requst is well-formed and/or potentially needs to be modified before proceeding
- Validation
- This stage ensures that the request is valid.
- Authentication
- You can determine the endpoint for the kubeapi server by running:
kubectl cluster-info - KubeAPI is also exposed via a service named ‘kubernetes’ in the default namespace
kubeadmin@kube-controlplane:~$ k get svc kubernetes -n default -o yaml apiVersion: v1 kind: Service metadata: creationTimestamp: "2024-11-11T10:57:42Z" labels: component: apiserver provider: kubernetes name: kubernetes namespace: default resourceVersion: "234" uid: 768d1a22-91ff-4ab3-8cd7-b86340fc319a spec: clusterIP: 10.96.0.1 clusterIPs: - 10.96.0.1 internalTrafficPolicy: Cluster ipFamilies: - IPv4 ipFamilyPolicy: SingleStack ports: - name: https port: 443 protocol: TCP targetPort: 6443 sessionAffinity: None type: ClusterIP status: loadBalancer: {}- The endpoint of the kube-api server is also exposed to pods via environment variables:
kubeadmin@kube-controlplane:~$ k exec -it other -- /bin/sh -c 'env | grep -i kube' KUBERNETES_SERVICE_PORT=443 KUBERNETES_PORT=tcp://10.96.0.1:443 KUBERNETES_PORT_443_TCP_ADDR=10.96.0.1 KUBERNETES_PORT_443_TCP_PORT=443 KUBERNETES_PORT_443_TCP_PROTO=tcp KUBERNETES_SERVICE_PORT_HTTPS=443 KUBERNETES_PORT_443_TCP=tcp://10.96.0.1:443 KUBERNETES_SERVICE_HOST=10.96.0.1 ```
Authentication
- There are two types of accounts that would need access to a cluster: Humans and Machines. There is no such thing as a ‘user account’ primitive in Kubernetes.
User accounts
- Developers, cluster admins, etc.
Service Accounts
- Service Accounts are created and managed by the Kubernetes API and can be used for machine authentication
- To create a service account:
kubectl create serviceaccount <account name> - Service accounts are namespaced
- When a service account is created, it has a token created automatically. The token is stored as a secret object.
- You can also use the base64 encoded token to communicate with the Kube API Server:
curl https://172.16.0.1:6443/api -insecure --header "Authorization: Bearer <token value>" - You can grant service accounts permission to the cluster itself by binding it to a role with a rolebinding. If a pod needs access to the cluster where it is hosted, you you configure the automountServiceAccountToken boolean parameter on the pod and assign it a service account that has the appropriate permissions to the cluster. The token will be mounted to the pods file system, where the value can then be accessed by the pod. The secret is mounted at
/var/run/secrets/kubernetes.io/serviceaccount/token. - A service account named ‘default’ is automatically created in every namespace
- As of kubernetes 1.22, tokens are automatically mounted to pods by an admission controller as a projected volume.
- https://github.com/kubernetes/enhancements/blob/master/keps/sig-auth/1205-bound-service-account-tokens/README.md
- As of Kubernetes 1.24, when you create a service account, a secret is no longer created automatically for the token. Now you must run
kubectl create token <service account name>to create the token.- https://github.com/kubernetes/enhancements/issues/2799
- One can also manually create a token for a service account:
kubectl create token <service-account-name> --duration=100h
TLS Certificates
- Server certificates are used to communicate with clients
- Client certificates are used to communicate with servers
- Server components used in Kubernetes and their certificates:
- kube-api server: apiserver.crt, apiserver.key
- etcd-server: etcdserver.crt, etcdserver.key
- kubelet: kubelet.crt, kubelet.key
- Client components used in kubernetes and their certificates:
- user certificates
- kube-scheduler: scheduler.crt, scheduler.key
- kube-controller-manager: controller-manager.crt, controller-manager.key
- kube-proxy: kubeproxy.crt, kubeproxy.key
- To generate a self-signed certificate:
openssl req -nodes -x509 -keyout my.key -out my.crt --subj="/CN=mysite.com" - To generate certificates, you can use openssl:
- Create a new private key:
openssl genrsa -out my.key 2048 - Create a new certificate signing request:
openssl req -new -key my.key -out my.csr -subj "/CN=ryan" - Sign the csr and generate the certificate or create a signing request with kube-api:
- Sign and generate:
openssl x509 -req -in my.csr -out my.crt - Create a
CertificateSigningRequestwith kube-api:# extract the base64 encoded values of the CSR: cat my.csr | base64 | tr -d '\n' # create a CertificateSigningRequest object with kube-api, provide the base64 encoded value .... see the docs
- Sign and generate:
- Create a new private key:
- kubeadm will automatically generate certificates for clusters that it creates
- kubeadm generates certificates in the
/etc/kubernetes/pki/directory
- kubeadm generates certificates in the
- To view the details of a certificate, use openssl:
openssl x509 -in <path to crt> -text -noout - Once you have a private key, you can sign it using the CertificateSigningRequest object. The controller manager is responsible for signing these requests. You can then use the signed certificate values to authenticate to the Kube API server by placing the signed key, certificate, and ca in a kube config file (~/.kube/config)
kubelet Security
- By default, requests to the kubelet API are not authenticated. These requests are bound to an ‘unauthenticated users’ group. This behavior can be changed by setting the
--anonymous-authflag tofalsein the kubelet config - kubelet ports
- port 10250 on the machine running a kubelet process serves an API that allows full access
- port 10255 on the machine running a kubelet process serves an unauthenticated, read-only API
- kubelet supports 2 authentication mechanisms: bearer token and certificated-based authentication
- You can find the location of the kubelet config file by looking at the process:
ps aux |grep -i kubelet
Authorization
Roles and ClusterRoles
- Roles and clusteroles define what a user or service account can do within a cluster
- The kubernetes primitive
roleis namespaced,clusterroleis not
Role Bindings and Cluster Role Bindings
rolebindingandclusterrolebindinglink a user or service account to a role
3 System Hardening
Principle of Least Privilege
- Ensure that people or bots only have access to what is needed, and nothing else.
Limit access to nodes
Managing Local Users and Groups
- Commands to be aware of:
idwholastgroupsuseradduserdelusermodgroupdel - Files to be aware of:
/etc/passwd/etc/shadow/etc/group - Disable logins for users and set their login shell to
/bin/nologin - Remove users from groups they do not need to belong to
Securing SSH
- Set the following in
sshd_config
PermitRootLogin no
PasswordAuthentication no
Using sudo
- The
/etc/sudoersfile controls and configures the behavior of thesudocommand. Each entry follows a structured syntax. Below is a breakdown of the fields and their meanings:
# Example Lines
# ----------------------------------
# User/Group Host=Command(s)
admin ALL=(ALL) NOPASSWD: ALL
%developers ALL=(ALL) ALL
john ALL=(ALL:ALL) /usr/bin/apt-get
# Field Breakdown
admin ALL=(ALL) NOPASSWD: ALL
| | | | |
| | | | +---> Command(s): Commands the user/group can execute.
| | | +------------> Options: Modifiers like `NOPASSWD` (no password required).
| | +--------------------> Runas: User/Group the command can be run as.
| +------------------------> Host: On which machine this rule applies (`ALL` for any).
+-----------------------------------------> User/Group: The user or group this rule applies to.
# Examples Explained
1. Allow `admin` to execute any command without a password:
admin ALL=(ALL) NOPASSWD: ALL
Remove Packages Packages
- This one is self-explanatory. Don’t have unnecessary software installed on your nodes.
Restrict Kernel Modules
- Kernel modules are ways of extending the kernel to enable it to understand new hardware. They are like device drivers.
modprobeallows you to load a kernel modulelsmodallows you to view all loaded modules- You can blacklist modules by adding a new entry to
/etc/modprobe.d/blacklist.conf- The entry should be in the format
blacklist <module name> - Example:
echo "blacklist sctp" >> /etc/modprobe.d/blacklist.conf
- The entry should be in the format
- You may need to reboot the system after disabling kernel modules or blacklisting them
Disable Open Ports
- Use
netstat -tunlpor to list listening ports on a system - Stop the service associated with the open port or disable access with a firewall
- Common firewalls you can use are
iptablesorufw- Run
ufw statusto list the current status of the UFW firewall - Allow all traffic outbound:
ufw default allow outgoing - Deny all incoming:
ufw default deny incoming - Allow SSH from 172.16.154.24:
ufw allow from 172.16.154.24 to any port 22 proto tcp
- Run
- Common firewalls you can use are
Tracing Syscalls
- There are several ways to trace syscalls in Linux.
strace
straceis included with most Linux distributions.- To use
strace, simply add it before the binary that you are running:strace touch /tmp/test - You can also attach
straceto a running process like this:strace -p <PID>
AquaSec Tracee
traceeis an open source tool created by AquaSec- Uses eBPF (extended Berkely Packet Filter) to trace syscalls on a system. eBPF runs programs directly within the kernel space without loading any kernel modules. As a result, tools that use eBPF are more efficient and typically use less resources.
traceecan be run by using the binaries or as a container
Restricting Access to syscalls with seccomp
-
seccompcan be used to restrict a process’ access to syscalls. It allows access to the most commonly used syscalls, while restricting access to syscalls that can be considered dangerous. -
To see if
seccompis enabled:grep -i seccomp /boot/config-$(uname -r) -
seccompcan operate in 1 of 3 modes:mode 0: disabledmode 1: strict (blocks nearly all syscalls, except for 4)mode 2: selectively filters syscalls- To see which mode the process is currently running in:
grep -i seccomp /proc/1/statuswhere ‘1’ is the PID of the process
-
seccompprofiles- Kubernetes provides a default
seccompprofile, that can be either restrictive or permissive, depending on your configuration - You can create custom profiles to fine-tune
seccompand which syscalls it blocks or allows within a containers - Example
seccompprofile formode 1:{ "defaultAction": "SCMP_ACT_ERRNO", "archMap": [ { "architecture": "SCMP_ARCH_X86_64", "subArchitectures": [] } ], "syscalls": [ { "names": ["read", "write", "exit", "sigreturn"], "action": "SCMP_ACT_ALLOW" } ] }
- Kubernetes provides a default
-
To apply a
seccompprofile to a pod:apiVersion: v1 kind: Pod metadata: name: audit-pod labels: app: audit-pod spec: securityContext: seccompProfile: type: Localhost localhostProfile: profiles/audit.json #this path is relative to default seccomp profile location (/var/lib/kubelet/seccomp) containers: - name: test-container image: hashicorp/http-echo:1.0 args: - "-text=just made some syscalls!" securityContext: allowPrivilegeEscalation: false
Restrict access to file systems
AppArmor
-
AppArmor can be used to limit a containers’ access to resources on the host. Why do we need apparmor if we have traditional discretionary access controls (file system permissions, etc.)? With discretionary access control, a running process will inherit the permissions of the user who started it. Likely more permissions than the process needs. AppArmor is a mandatory access control implementation that allows one to implement fine-grained controls over what a process can access or do on a system.
-
AppArmor runs as a daemon on Linux systems. You can check it’s status using
systemctl:systemctl status apparmor- If
apparmor-utilsis installed, you can also useaa-status
- If
-
To use AppArmor, the kernel module must also be loaded. The check status:
cat /sys/module/apparmor/parameters/enabledY = loaded -
AppArmor profiles define what a process can and cannot do and are stored in
/etc/apparmor.d/. Profiles need to be copied to every worker node and loaded. -
Every profile needs to be loaded into AppArmor before it can take effect
- To view loaded profiles, run
aa-status
- To view loaded profiles, run
-
To load a profile:
apparmor_parser -r -W /path/to/profile- If
apparmor-utilsis installed, you can also useaa-enforceto load a profile
- If
-
Profiles are loaded in ‘enforce’ mode by default. To change the mode to ‘complain’:
apparmor_parser -C /path/to/profile- If
apparmor-utilsis installed, you can also useaa-complainto change the mode
- If
-
To view loaded apparmor profiles:
kubeadmin@kube-controlplane:~$ sudo cat /sys/kernel/security/apparmor/profiles cri-containerd.apparmor.d (enforce) wpcom (unconfined) wike (unconfined) vpnns (unconfined) vivaldi-bin (unconfined) virtiofsd (unconfined) rsyslogd (enforce) vdens (unconfined) uwsgi-core (unconfined) /usr/sbin/chronyd (enforce) /usr/lib/snapd/snap-confine (enforce) /usr/lib/snapd/snap-confine//mount-namespace-capture-helper (enforce) tcpdump (enforce) man_groff (enforce) man_filter (enforce) ....or:
root@controlplane00:/etc/apparmor.d# aa-status apparmor module is loaded. 33 profiles are loaded. 12 profiles are in enforce mode. /home/rtn/tools/test.sh /usr/bin/man /usr/lib/NetworkManager/nm-dhcp-client.action /usr/lib/NetworkManager/nm-dhcp-helper /usr/lib/connman/scripts/dhclient-script /usr/sbin/chronyd /{,usr/}sbin/dhclient lsb_release man_filter man_groff nvidia_modprobe nvidia_modprobe//kmod 21 profiles are in complain mode. avahi-daemon dnsmasq dnsmasq//libvirt_leaseshelper identd klogd mdnsd nmbd nscd php-fpm ping samba-bgqd samba-dcerpcd samba-rpcd samba-rpcd-classic samba-rpcd-spoolss smbd smbldap-useradd smbldap-useradd///etc/init.d/nscd syslog-ng syslogd traceroute 0 profiles are in kill mode. 0 profiles are in unconfined mode. 4 processes have profiles defined. 2 processes are in enforce mode. /usr/sbin/chronyd (704) /usr/sbin/chronyd (708) 2 processes are in complain mode. /usr/sbin/avahi-daemon (587) avahi-daemon /usr/sbin/avahi-daemon (613) avahi-daemon 0 processes are unconfined but have a profile defined. 0 processes are in mixed mode. 0 processes are in kill mode. -
AppArmor defines profile modes that determine how the profile behaves:
- Modes:
- Enforced: Action is taken and the application is allowed/blocked from performing defined actions. Events are logged in syslog.
- Complain: Events are logged but no action is taken
- Unconfined: application can perform any task and no event is logged
- Modes:
-
AppArmor Tools
- Can be used to generate apparmor profiles
- To install:
apt install -y apparmor-utils - Run
aa-genprofto generate a profile:aa-genprof ./my-application
-
Before applying an AppArmor profile to a pod, you must ensure the container runtime supports AppArmor. You must also ensure AppArmor is installed on the worker node and that all necessary profiles are loaded.
-
To apply an AppArmor profile to a pod, you must add the following security profile (K8s 1.30+):
securityContext: appArmorProfile: type: <profile_type> localhostProfile: <profile_name>- <profile_type> can be one of 3 values:
Unconfined,RuntimeDefault, orLocalhostUnconfinedmeans the container is not restricted by AppArmorRuntimeDefaultmeans the container will use the default AppArmor profileLocalhostmeans the container will use a custom profile
- <profile_type> can be one of 3 values:
Deep Dive into AppArmor Profiles
AppArmor profiles define security rules for specific applications, specifying what they can and cannot do. These profiles reside in /etc/apparmor.d/ and are loaded into the kernel to enforce security policies.
- Each profile follows these structure:
profile <profile_name> <executable_path> {
<rules>
}
- Example profile, a profile for nano:
profile nano /usr/bin/nano {
# Allow reading any file
file,
# Deny writing to system directories
deny /etc/* rw,
}
- Types of AppArmor rules:
- File Access Rules:
/home/user/data.txt r # Read-only access /etc/passwd rw # Read & write access /tmp/ rw # Full access to /tmp - Network Access Rules:
network inet tcp, # Allow TCP connections network inet udp, # Allow UDP connections network inet dgram, # Allow datagram connections - Capability Rules:
deny capability sys_admin, # Deny sys_admin capability deny capability sys_ptrace, # Deny sys_ptrace capability - File Access Rules:
Linux Capabilities in Pods
- For the purpose of performing permission checks, traditional UNIX implementations distinguish two categories of processes: privileged processes (whose effective user ID is 0, referred to as superuser or root), and unprivileged processes (whose effective UID is nonzero). Privileged processes bypass all kernel permission checks, while unprivileged processes are subject to full permission checking based on the process’s credentials (usually: effective UID, effective GID, and supplementary group list).
- Starting with Linux 2.2, Linux divides the privileges traditionally associated with superuser into distinct units, known as capabilities, which can be independently enabled and disabled. Capabilities are a per-thread attribute.
- Capabilities control what a process can do
- Some common capabilities
CAP_SYS_ADMINCAP_NET_ADMINCAP_NET_RAW
- To view the capabilities of a process:
getcap- Check the capabilities of a binary -getcap <path to bin>getpcaps- Check the capabilities of a process -getpcaps <pid>
4 Minimize Microservice Vulnerabilities
Pod Security Admission
- Replaced Pod Security Policies
- Pod Security Admission controller enforces pod security standards on pods
- All you need to do to opt into the PSA feature is to add a label with a specific format to a namespace. All pods in that namespace will have to follow the standards declared.
- The label consists of three parts: a prefix, a mode, and a level
- Example:
pod-security.kubernetes.io/restricted=privileged - Prefix:
pod-security.kubernetes.io - Mode:
enforce,audit, orwarn- Enforce: blocks pods that do not meet the PSS
- Audit: logs violations to the audit log but does not block pod creation
- Warn: logs violations on the console but does not block pod created
- Level:
privileged,baseline, orrestricted- Privileged: fully unrestricted
- Allowed: everything
- Baseline: some restrictions
- Allowed: most things, except sharing host namespaces, hostPath volumes and hostPorts, and privileged pods
- Restricted: most restrictions
- Allowed: very few things, like running as root, using host networking, hostPath volumes, hostPorts, and privileged pods. The pod must be configured with a seccomp profile.
- Privileged: fully unrestricted
Security Contexts
- Security contexts are used to control the security settings of a pod or container
- Security contexts can be defined at the pod level or the container level. Settings defined at the container level will override identical settings defined at the pod level
- Security contexts can be used to:
- Run a pod as a specific user
- Run a pod as a specific group
- Run a pod with specific Linux capabilities
- Run a pod with a read-only root filesystem
- Run a pod with a specific SELinux context
- Run a pod with a specific AppArmor profile
- You can view the capabilities of a process by viewing the status file of the process and grepping for capabilities:
rtn@worker02:~$ cat /proc/self/status |grep -i cap CapInh: 0000000000000000 CapPrm: 0000000000000000 CapEff: 0000000000000000 CapBnd: 000001ffffffffff CapAmb: 0000000000000000
These values are encoded in hexadecimal. To decode them, use the capsh command:
rtn@worker02:~$ sudo capsh --decode=000001ffffffffff 0x000001ffffffffff=cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read,cap_perfmon,cap_bpf,cap_checkpoint_restore
Admission Controllers
- Admission Controllers are used for automation within a cluster
- Once a request to the KubeAPI server has been authenticated and then authorized, it is intercepted and handled by any applicable Admission Controllers
- Example Admission Controllers:
- ImagePolicyWebook
- You may see this one on the exam.
- When enabled, the ImagePolicyWebook admission controller contacts an external service (that you or someone else wrote in whatever language you want, it just needs to accept and respond to HTTP requests).
- To enable, add ‘ImagePolicyWebook’ to the ‘–enable-admission-plugins’ flag of the kube-api server
- You must also supply an AdmissionControlFileFile file, which is a kubeconfig formatted file. Then pass the path to this config to the kube-api server with the
--admission-control-config-file=<path to config file>. Note that this path is the path inside the kube-api container, so you must mount this path on the host to the pod as ahostPathmount.
- AlwaysPullImages
- DefaultStorageClass
- EventRateLimit
- NamespaceExists
- … and many more
- ImagePolicyWebook
- Admission Controllers help make Kubernetes modular
- To see which Admission Controllers are enabled:
- you can either grep the kubeapi process:
ps aux |grep -i kube-api | grep -i admission - or you can look at the manifest for the KubeAPI server (if the cluster was provisioned with KubeADM)
grep admission -A10 /etc/kubernetes/manifests/kube-apiserver.yaml - or if the cluster was provisioned manually you can look at the systemd unit file for the kube-api server daemon
- you can either grep the kubeapi process:
- There are two types of admission controllers:
- Mutating - can make changes to ‘autocorrect’
- Validating - only validates configuration
- Mutating are invoked first. Validating second.
- The admission controller runs as a webhook server. It can run inside the cluster as a pod or outside the cluster on another server.
- Some admission-controllers required a configuration file to be passed to the kube-api server. This file is passed using the
--admission-control-config-fileflag.
Open Policy Agent
- OPA can be used for authorization. However, it is more likely to be used in the admission control phase.
- OPA can be deployed as a daemonset on a node or as a pod
- OPA policies use a language called rego
OPA in Kubernetes
GateKeeper
-
Gatekeeper Constraint Framework
- Gatekeeper is a validating and mutating webhook that enforces CRD-based policies executed by Open Policy Agent, a policy engine for Cloud Native environments hosted by CNCF as a graduated project.
- The framework that helps us implement what, where, and how we want to do something in Kubernetes
- Example:
- What: Add labels, etc.
- Where: kube-system namespace
- How: When a pod is created
- Example:
-
To run Gatekeeper in Kubernetes, simply apply the manifests provided by OPA
-
The pods and other resources are created in the gatekeeper-system namespace
-
Constraint Templates
-
Before you can define a constraint, you must first define a ConstraintTemplate, which describes both the Rego that enforces the constraint and the schema of the constraint. The schema of the constraint allows an admin to fine-tune the behavior of a constraint, much like arguments to a function.
-
Here is an example constraint template that requires all labels described by the constraint to be present:
``` apiVersion: templates.gatekeeper.sh/v1 kind: ConstraintTemplate metadata: name: k8srequiredlabels spec: crd: spec: names: kind: K8sRequiredLabels validation: # Schema for the `parameters` field openAPIV3Schema: type: object properties: labels: type: array items: type: string targets: - target: admission.k8s.gatekeeper.sh rego: | package k8srequiredlabels violation[{"msg": msg, "details": {"missing_labels": missing}}] { provided := {label | input.review.object.metadata.labels[label]} required := {label | label := input.parameters.labels[_]} missing := required - provided count(missing) > 0 msg := sprintf("you must provide labels: %v", [missing]) } ```
-
-
Constraints
- Constraints are then used to inform Gatekeeper that the admin wants a ConstraintTemplate to be enforced, and how. This constraint uses the K8sRequiredLabels constraint template above to make sure the gatekeeper label is defined on all namespaces:
apiVersion: constraints.gatekeeper.sh/v1beta1 kind: K8sRequiredLabels metadata: name: ns-must-have-gk spec: match: kinds: - apiGroups: [""] kinds: ["Namespace"] parameters: labels: ["gatekeeper"] - The
matchfield supports multiple options: https://open-policy-agent.github.io/gatekeeper/website/docs/howto#the-match-field
- Constraints are then used to inform Gatekeeper that the admin wants a ConstraintTemplate to be enforced, and how. This constraint uses the K8sRequiredLabels constraint template above to make sure the gatekeeper label is defined on all namespaces:
-
After creating the constraint from the constrainttemplate, you can view all violations by describing the constraint:
- Example:
kubectl describe k8srequiredlabels ns-must-have-gk
- Example:
Kubernetes Secrets
- Secrets are used to store sensitive information in Kubernetes
- base64 encoded when stored in etcd
- Can be injected into a pod as an env or mounted as a volume
Encrypting etcd
-
By default, the API server stores plain-text representations of resources into etcd, with no at-rest encryption.
-
The kube-apiserver process accepts an argument –encryption-provider-config that specifies a path to a configuration file. The contents of that file, if you specify one, control how Kubernetes API data is encrypted in etcd.
-
If you are running the kube-apiserver without the –encryption-provider-config command line argument, you do not have encryption at rest enabled. If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies the identity provider as the first encryption provider in the list, then you do not have at-rest encryption enabled (the default identity provider does not provide any confidentiality protection.)
-
If you are running the kube-apiserver with the –encryption-provider-config command line argument, and the file that it references specifies a provider other than identity as the first encryption provider in the list, then you already have at-rest encryption enabled. However, that check does not tell you whether a previous migration to encrypted storage has succeeded.
-
Example EncryptionConfiguration:
apiVersion: apiserver.config.k8s.io/v1 kind: EncryptionConfiguration resources: - resources: - secrets - configmaps - pandas.awesome.bears.example # a custom resource API providers: # This configuration does not provide data confidentiality. The first # configured provider is specifying the "identity" mechanism, which # stores resources as plain text. # - identity: {} # plain text, in other words NO encryption - aesgcm: keys: - name: key1 secret: c2VjcmV0IGlzIHNlY3VyZQ== - name: key2 secret: dGhpcyBpcyBwYXNzd29yZA== - aescbc: keys: - name: key1 secret: c2VjcmV0IGlzIHNlY3VyZQ== - name: key2 secret: dGhpcyBpcyBwYXNzd29yZA== - secretbox: keys: - name: key1 secret: YWJjZGVmZ2hpamtsbW5vcHFyc3R1dnd4eXoxMjM0NTY= - resources: - events providers: - identity: {} # do not encrypt Events even though *.* is specified below - resources: - '*.apps' # wildcard match requires Kubernetes 1.27 or later providers: - aescbc: keys: - name: key2 secret: c2VjcmV0IGlzIHNlY3VyZSwgb3IgaXMgaXQ/Cg== - resources: - '*.*' # wildcard match requires Kubernetes 1.27 or later providers: - aescbc: keys: - name: key3 secret: c2VjcmV0IGlzIHNlY3VyZSwgSSB0aGluaw== -
Each resources array item is a separate config and contains a complete configuration. The resources.resources field is an array of Kubernetes resource names (resource or resource.group) that should be encrypted like Secrets, ConfigMaps, or other resources.
-
https://kubernetes.io/docs/tasks/administer-cluster/encrypt-data/
-
After enabling encryption in etcd, any secrets that you created prior to enabling encryption will not be encrypted. You can encrypt them by running:
kubectl get secrets -A -o yaml | kubectl replace -f -
- Example of getting a secret in etcd:
ETCDCTL_API=3 etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/apiserver-etcd-client.crt --key=/etc/kubernetes/pki/apiserver-etcd-client.key get /registry/secrets/three/con1
The path to the resource in the etcd database is ‘/registry/
Container Sandboxing
- Containers are not contained!
- A container sandbox is a mechanism that provides an additional layer of isolation between the container and the host
- Container sandboxing is implemented via Runtime Class objects in Kubernetes.
- The default container runtime is
runc. However, we can change this to userunsc(gvisor) or Kata - Sandboxing prevents the dirty cow exploit, which allows a user to gain root access to the host
- Dirty COW works by exploiting a race condition in the Linux kernel
gVisor
- gVisor is a kernel written in Golang that intercepts system calls made by a container
- gVisor is like a ‘syscall proxy’ that sits between the container and the kernel
- components
- sentry -
- gofer -
- components
- Not all apps will work with gVisor
- gVisor will cause performance degradation in your app due to the additional time taken
- gVisor uses runsc as the runtime handler
Kata Containers
- Kata inserts each container into it’s own virtual machine, giving each it’s own kernel
- Kata containers require nested virtualisation support, so it may not work with all cloud providers
RuntimeClass
- RuntimeClass is a new feature in Kubernetes that allows you to specify which runtime to use for a pod
To use a runtime class
-
Create a new
runtimeclassobject:apiVersion: node.k8s.io/v1 handler: runsc kind: RuntimeClass metadata: name: secure-runtime -
Specify the
runtimeClassNamein the pod definition:apiVersion: v1 kind: Pod metadata: name: simple-webapp-1 labels: name: simple-webapp spec: runtimeClassName: secure-runtime containers: - name: simple-webapp image: kodekloud/webapp-delayed-start ports: - containerPort: 8080
Resource Quotas
- Control requests and limits for CPU and memory within a namespace
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-a-resource-quota
namespace: team-a
spec:
hard:
pods: "5"
requests.cpu: "0.5"
requests.memory: 500Mi
limits.cpu: "1"
limits.memory: 1Gi
apiVersion: v1
kind: ResourceQuota
metadata:
name: pods-medium
spec:
hard:
cpu: "10"
memory: 20Gi
pods: "10"
scopeSelector:
matchExpressions:
- operator : In
scopeName: PriorityClass
values: ["medium"]
API Priority and Fairness
- https://kubernetes.io/docs/concepts/cluster-administration/flow-control/
- With API Priority and Fairness, you can define which resources need to be prioritized over others in regards to requests to the KubeAPI server
- To configure API Priority and Fairness, you create a
PriorityLevelConfigurationobject:? Is this still supported? Is it an exam topic? I cannot find the manifest spec.
Pod Priority and Preemption
- With Pod Priority and Preemption, you can ensure that critical pods are running while the cluster is under resource contention by killing lower priority pods
- To implement Pod Priority and Preemption:
- Create a
priorityClass object(or several):apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000000 globalDefault: false description: "This priority class should be used for XYZ service pods only." --- apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: low-priority value: 100 globalDefault: false description: "This priority class should be used for XYZ service pods only." - Assign the
priorityClassto a pod:apiVersion: v1 kind: Pod metadata: name: nginx labels: env: test spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent priorityClassName: high-priority
- Create a
Pod to Pod Encryption
- mTLS can be used to encrypt traffic between pods
- Methods of p2p encryption
- Service Mesh
- Service Mesh can offload the encryption and decryption of traffic between pods by using a sidecar proxy
- Examples:
- Istio
- Istio uses Envoy as a sidecar proxy
- Istio uses a sidecar proxy to encrypt traffic between pods
- Linkerd
- Istio
- Wireguard
- Cilium
- uses eBPF for network security
- Encrytion is transparent to the application
- Provides flexible encryption options
- Cilium
- IPSec
- Calico
- Service Mesh
5 Supply Chain Security
SBOM
- Supply chain security is the practice of ensuring that the software and hardware that you use in your environment is secure
- In the context of the CKS exam, supply chain security refers to the security of the software that you use in your Kubernetes environment
Reduce docker image size
-
Smaller images are faster to download and deploy
-
Smaller images are more secure
-
Smaller images are easier to manage
-
To reduce the size of a docker image:
- Use a smaller base image
- Use specific package/image versions
- Make file-system read-only
- Don’t run the container as root
- Use multi-stage builds
- Remove unnecessary files
- Use a
.dockerignorefile to exclude files and directories from the image - Use
COPYinstead ofADD - Use
alpineimages - Use
scratchimages - Use
distrolessimages
-
Example of a multi-stage build:
# build container stage 1 FROM ubuntu ARG DEBIAN_FRONTEND=noninteractive RUN apt-get update && apt-get install -y golang-go COPY app.go . RUN CGO_ENABLED=0 go build app.go # app container stage 2 FROM alpine:3.12.1 # it is better to use a defined tag, rather than 'latest' RUN addgroup -S appgroup && adduser -S appuser -G appgroup -h /home/appuser COPY --from=0 /app /home/appuser/app USER appuser # run as a non-root user CMD ["/home/appuser/app"] -
Dockerfile best practices: https://docs.docker.com/build/building/best-practices/
-
Only certain docker directives create new layers in an image
FROMCOPYCMDRUN
-
diveanddocker-slimare two tools you can use to explore the individual layers that make up an image
Static Analysis
SBOM
- A SBOM is a list of all the software that makes up a container image (or an application, etc.)
- Formats
- SPDX
- The standard format for sharing SBOM
- Available in JSON, RDF, and tag/value formats
- More complex than CycloneDX due to it’s extensive metadata coverage
- Comprehensive metadata including license information, origin, and file details
- CycloneDX
- A lightweight format focused on security and compliance
- Available in JSON and XML formats
- Simpler and more focused on essential SBOM elements
- Focuses on component details, vulnerabilities, and dependencies
- SPDX
Kubesec
- Used for static analysis of manifests
- https://github.com/controlplaneio/kubesec
Syft
- Syft is a powerful and easy-to-use open-source tool for generating Software Bill of Materials (SBOMs) for container images and filesystems. It provides detailed visibility into the packages and dependencies in your software, helping you manage vulnerabilities, license compliance, and software supply chain security.
- Syft can export results in SPDX, CycloneDX, JSON, etc.
- To scan an image with syft and export the results to a file in SPDX format:
syft scan docker.io/kodekloud/webapp-color:latest -o spdx --file /root/webapp-spdx.sbom
Grype
- Grype is a tool (also from Anchore) that can be used to scan SBOM for vulnerabilities
- To scan a SBOM with Grype:
grype /root/webapp-sbom.json -o json --file /root/grype-report.json
Kube-linter
- Kube-linter can be used to lint Kubernetes manifests and ensure best practices are being followed
- kube-linter is configurable. You can disable/enable checks and even create your own custom checks
- kube-linter includes recommendations for how to fix failed checks
- https://github.com/stackrox/kube-linter
Scanning Images for Vulnerabilities
trivy
- trivy can be used to scan images, git repos, and filesystems for vulnerabilities
- https://github.com/aquasecurity/trivy
- Example:
sudo docker run --rm aquasec/trivy:0.17.2 nginx:1.16-alpine
6 Monitoring, Logging, and Runtime Security
falco
- Falco is an IDS for Kubernetes workloads
- falco is a cloud native security tool. It provides near real-time threat detection for cloud, container, and Kubernetes workloads by leveraging runtime insights. Falco can monitor events defined via customizable rules from various sources, including the Linux kernel, and enrich them with metadata from the Kubernetes API server, container runtime, and more. Falco supports a wide range of kernel versions, x86_64 and ARM64 architectures, and many different output channels.
- falco uses sydig filters to extract information about an event. They are configured in the falco rules.yaml or configmap. They can also be passed via helm values.
/etc/falco/falco.yaml- the main configuration file for falco/etc/falco/falco_rules.yaml- the main rules file for falco
- falco rule files consist of 3 elements defined in YAML:
- rules - a rule is a condition under which an alert should be generated
- macros - a macro is a reusable rule condition. These help keep the rules file clean and easy to read
- lists - a collection of items that can be used in rules and macros
- Some examples of events that falco watches for:
- Reading or writing files at a specific location in the filesystem
- Opening a shell binary for a container, such as /bin/bash
- Sending/receives traffic from undesired URLs
- Falco deploys a set of sensors that listen for configured events and conditions
- Each sensor contains a set of rules that map an event to a data source.
- An alert is produced when a rule matches a specific event
- Alerts are then sent to an output channel to record the event
Ensuring Container Immutability
- Containers should be immutable. This means that once a container is created, it should not be changed. If changes are needed, a new container should be created.
- Containers are mutable (changeable) by default. This can lead to security vulnerabilities.
- To ensure container immutability:
- Use a ‘distroless’ container image. These images are minimal and contain only the necessary components to run an application. They do not include a shell.
- Use a ‘read-only’ file system. This prevents changes to the file system. To configure a read-only file system, add the following to the pod spec:
spec: containers: - name: my-container image: my-image securityContext: readOnlyRootFilesystem: true
Audit Logs
- Auditing involves recording and tracking all events and actions within the cluster
- Who made a change, when was it changed, and what exactly was changed
- Audit logs provide a chronological record of activities within a cluster
- Entries in the audit log exist in ‘JSON Lines’ format. Note that this is not the same as JSON. Each line in the log is a separate JSON object.
- Types of Audit Policies:
- None - no logging
- Metadata - Logs request metadata, but not request or response body
- Request - Logs request metadata and request body, but no response body
- Request/Response - Logs the metadata, request body, and response body
Sample Audit Policy
```
apiVersion: audit.k8s.io/v1 # This is required.
kind: Policy
omitStages:
- "RequestReceived"
rules:
# Log pod changes at RequestResponse level
- level: RequestResponse
resources:
- group: ""
# Resource "pods" doesn't match requests to any subresource of pods,
# which is consistent with the RBAC policy.
resources: ["pods"]
# Log "pods/log", "pods/status" at Metadata level
- level: Metadata
resources:
- group: ""
resources: ["pods/log", "pods/status"]
# Don't log requests to a configmap called "controller-leader"
- level: None
resources:
- group: ""
resources: ["configmaps"]
resourceNames: ["controller-leader"]
# Don't log watch requests by the "system:kube-proxy" on endpoints or services
- level: None
users: ["system:kube-proxy"]
verbs: ["watch"]
resources:
- group: "" # core API group
resources: ["endpoints", "services"]
# Don't log authenticated requests to certain non-resource URL paths.
- level: None
userGroups: ["system:authenticated"]
nonResourceURLs:
- "/api*" # Wildcard matching.
- "/version"
# Log the request body of configmap changes in kube-system.
- level: Request
resources:
- group: "" # core API group
resources: ["configmaps"]
# This rule only applies to resources in the "kube-system" namespace.
# The empty string "" can be used to select non-namespaced resources.
namespaces: ["kube-system"]
# Log configmap and secret changes in all other namespaces at the Metadata level.
- level: Metadata
resources:
- group: "" # core API group
resources: ["secrets", "configmaps"]
# Log all other resources in core and extensions at the Request level.
- level: Request
resources:
- group: "" # core API group
- group: "extensions" # Version of group should NOT be included.
# A catch-all rule to log all other requests at the Metadata level.
- level: Metadata
# Long-running requests like watches that fall under this rule will not
# generate an audit event in RequestReceived.
omitStages:
- "RequestReceived"
```
- Once the audit policy has been defined, you can apply it to the cluster by passing the
--audit-policy-fileflag to the kube-apiserver - To use a file-based log backend, you need to pass 3 configurations to the kube-apiserver:
--audit-policy-file- this is the path to the audit policy file--audit-log-path- this is the path to the audit log file- both of these paths needs to be mounted in the kube-apiserver. The kube-apiserver cannot read these files on the node without a proper
volumeMount