1
0
Fork 0
mirror of https://github.com/kyverno/kyverno.git synced 2024-12-14 11:57:48 +00:00

Add scaling testing instructions (#7295)

* add instructions

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* add etcd sizes for pods

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* add kwok script

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* update kwok script

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* update node creation script

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* add script to calculate size

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* update

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* update

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* update

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* update

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* update

Signed-off-by: ShutingZhao <shuting@nirmata.com>

* linter fixes

Signed-off-by: ShutingZhao <shuting@nirmata.com>

---------

Signed-off-by: ShutingZhao <shuting@nirmata.com>
This commit is contained in:
shuting 2023-05-30 22:57:32 +08:00 committed by GitHub
parent 8ecf829647
commit 47bf1e8612
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
5 changed files with 563 additions and 0 deletions

208
docs/perf-testing/README.md Normal file
View file

@ -0,0 +1,208 @@
This document outlines the instructions for performance testing using [Kwok](https://kwok.sigs.k8s.io/) for the Kyverno 1.10 release.
# Pre-requisite
## Install etcdctl
```sh
ETCD_VER=v3.4.13
# choose either URL
GOOGLE_URL=https://storage.googleapis.com/etcd
GITHUB_URL=https://github.com/etcd-io/etcd/releases/download
DOWNLOAD_URL=${GOOGLE_URL}
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
rm -rf /tmp/etcd-download-test && mkdir -p /tmp/etcd-download-test
curl -L ${DOWNLOAD_URL}/${ETCD_VER}/etcd-${ETCD_VER}-linux-amd64.tar.gz -o /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
tar xzvf /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz -C /usr/local/bin --strip-components=1
rm -f /tmp/etcd-${ETCD_VER}-linux-amd64.tar.gz
etcd --version
etcdctl version
```
More details for etcdctl installation can be found [here](https://github.com/etcd-io/etcd/releases/tag/v3.4.13).
## Download k3d:
```sh
wget -q -O - https://raw.githubusercontent.com/k3d-io/k3d/main/install.sh | bash
```
More details for k3d installation can be found [here](https://k3d.io/v5.4.9/#install-script).
# Create a base cluster using K3d
To quickly try out the scaling test, you can use the following command to create the k3d cluster with 3 workers:
```sh
k3d cluster create --agents=3 --k3s-arg "--disable=metrics-server@server:*" --k3s-node-label "ingress-ready=true@agent:*"
```
To set up embedded etcd for the K3s cluster, follow instructions below.
```sh
k3d cluster create scaling --servers 3 --agents=15 --k3s-arg "--disable=metrics-server@server:*" --k3s-node-label "ingress-ready=true@agent:*"
```
Use the following command if you want to configure the etcd storage limit, this command sets the storage limit to 8GB:
```sh
k3d cluster create scaling --servers 3 --agents=15 --k3s-arg "--disable=metrics-server@server:*" --k3s-node-label "ingress-ready=true@agent:*" --k3s-arg "--etcd-arg=quota-backend-bytes=8589934592@server:*"
```
Note, you can execute into the server node to check the storage setting:
```
docker exec -ti k3d-scaling-server-0 sh
cat /var/lib/rancher/k3s/server/db/etcd/config | tail -2
quota-backend-bytes: 8589934592
```
## Prepare etcd access
```sh
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
docker cp k3d-scaling-server-0:/var/lib/rancher/k3s/server/tls/etcd/server-ca.crt ./server-ca.crt
docker cp k3d-scaling-server-0:/var/lib/rancher/k3s/server/tls/etcd/server-client.crt ./server-client.crt
docker cp k3d-scaling-server-0:/var/lib/rancher/k3s/server/tls/etcd/server-client.key ./server-client.key
etcd=https://$(kubectl get node -o wide | grep k3d-scaling-server-0 | awk '{print $6}'):2379
etcd_ep=$etcd/version
curl -L --cacert ./server-ca.crt --cert ./server-client.crt --key ./server-client.key $etcd_ep
export ETCDCTL_ENDPOINTS=$etcd
export ETCDCTL_CACERT='./server-ca.crt'
export ETCDCTL_CERT='./server-client.crt'
export ETCDCTL_KEY='./server-client.key'
export ETCDCTL_API=3
etcdctl endpoint status -w table
```
Credits to [k3s etcd commands](https://gist.github.com/superseb/0c06164eef5a097c66e810fe91a9d408).
# Deploy Kwok in a cluster
```sh
./docs/perf-testing/kwok.sh
```
## Create `Kwok` nodes
Run the script to create the desired number of nodes for your Kwok cluster:
```sh
./docs/perf-testing/node.sh
```
More about Kwok on this [page](https://kwok.sigs.k8s.io/docs/user/kwok-in-cluster/).
## Setup Monitor Components
```
make dev-lab-metrics-server dev-lab-prometheus
```
# Install Kyverno
```sh
helm repo update
helm upgrade --install kyverno kyverno/kyverno -n kyverno \
--create-namespace \
--set admissionController.serviceMonitor.enabled=true \
--set admissionController.replicas=3 \
--set reportsController.serviceMonitor.enabled=true \
--set reportsController.resources.limits.memory=10Gi
# --devel \
# --set features.admissionReports.enabled=false \
```
## Deploy Kyverno PSS policies
```sh
helm upgrade --install kyverno kyverno/kyverno-policies --set=podSecurityStandard=restricted --set=background=true --set=validationFailureAction=Enforce --devel
```
# Create workloads
This script creates 1000 pods, with QPS and burst set to 50:
```sh
kubectl create ns test
go run docs/perf-testing/main.go --count=1000 --kinds=pods --clientRateLimitQPS=50 --clientRateLimitBurst=50 --namespace=test
```
Note that these pods will be scheduled to the Kwok nodes, not k3s nodes.
# Prometheus Queries
To view the Prometheus dashboard, you can expose it on your localhost's port at 9090:
```
kubectl port-forward --address 127.0.0.1 svc/kube-prometheus-stack-prometheus 9090:9090 -n monitoring &
```
## Memory utilization
To get an view of the memory utilization overtime, you can select by the container image for a specific Kyverno controller:
```
container_memory_working_set_bytes{image="ghcr.io/kyverno/kyverno:v1.10.0-rc.1"}
```
`container_memory_working_set_bytes` gives you the current working set in bytes, and this is what the OOM killer is watching for.
## CPU utilization
```
rate(container_cpu_usage_seconds_total{image="ghcr.io/kyverno/kyverno:v1.10.0-rc.1"}[1m])
```
`container_cpu_usage_seconds_total` is the sum of the total amount of “user” time (i.e. time spent not in the kernel) and the total amount of “system” time (i.e. time spent in the kernel). This query gives the average CPU usage in the last 1 minute.
## Admission Request Rate
It's a bit tricky to get the precise Admission Request rate (ARPS). When using the Prometheus [rate()](https://prometheus.io/docs/prometheus/latest/querying/functions/#rate) function, it always requires a time window to calculate the rate with the given internal. The rate may differ when the window differs.
During our test, we calculate the increment in the count of admission requests recorded at the start and end time of a particular duration. Next, we divide this increment by the duration of the time window to derive the average admission request rate during that period.
```
sum(kyverno_admission_requests_total)
```
## Objects sizes in etcd
Run the following script to calculate total sizes for the given resource (pods in the following example):
```sh
$ ./docs/perf-testing/size.sh
Enter the resource to calculate the size:
pods
The total size for pods is 8861737 bytes.
```
You can also check the total etcd size:
```sh
$ etcdctl endpoint status -w table
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://172.19.0.2:2379 | d7380397c3ec4b90 | 3.5.3 | 84 MB | true | false | 2 | 154449 | 154449 | |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
```
This command returns the resources stored in etcd that have more than 100 objects:
```sh
kubectl get --raw=/metrics | grep apiserver_storage_objects |awk '$2>100' |sort -g -k 2
```
## Admission review latency (average)
Kyverno exposes two metrics that can be used to calculate the admission review latency,
```
sum(kyverno_admission_review_duration_seconds_sum{resource_request_operation=~"create|update"})/sum(kyverno_admission_review_duration_seconds_count{resource_request_operation=~"create|update"})
```
The following metrics exposed by Prometheus should give you the same result if you follow the same setup on this page:
```
sum(apiserver_admission_webhook_admission_duration_seconds_sum{name="validate.kyverno.svc-fail",operation="CREATE"}) / sum(apiserver_admission_webhook_admission_duration_seconds_count{name="validate.kyverno.svc-fail",operation="CREATE"})
```

24
docs/perf-testing/kwok.sh Executable file
View file

@ -0,0 +1,24 @@
#!/bin/bash
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
# Variables preparation
KWOK_WORK_DIR=$(mktemp -d)
KWOK_REPO=kubernetes-sigs/kwok
KWOK_LATEST_RELEASE=$(curl "https://api.github.com/repos/${KWOK_REPO}/releases/latest" | jq -r '.tag_name')
# Render kustomization yaml
cat <<EOF > "${KWOK_WORK_DIR}/kustomization.yaml"
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
images:
- name: registry.k8s.io/kwok/kwok
newTag: "${KWOK_LATEST_RELEASE}"
resources:
- "https://github.com/${KWOK_REPO}/kustomize/kwok?ref=${KWOK_LATEST_RELEASE}"
EOF
kubectl kustomize "${KWOK_WORK_DIR}" > "${KWOK_WORK_DIR}/kwok.yaml"
# create `kwok` deployment
kubectl apply -f "${KWOK_WORK_DIR}/kwok.yaml"

246
docs/perf-testing/main.go Normal file
View file

@ -0,0 +1,246 @@
package main
import (
"context"
"flag"
"fmt"
"os"
"strconv"
"strings"
"sync"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
kubernetes "k8s.io/client-go/kubernetes"
clientcmd "k8s.io/client-go/tools/clientcmd"
)
var (
kubeconfig string
namespace string
kinds string
clientRateLimitBurst int
clientRateLimitQPS float64
replicas int
count int
delete bool
)
func main() {
var burst int = 100
var qps float64 = 100
flagset := flag.NewFlagSet("perf-testing", flag.ExitOnError)
flagset.StringVar(&kubeconfig, "kubeconfig", "/root/.kube/config", "Path to a kubeconfig. Only required if out-of-cluster.")
flagset.StringVar(&namespace, "namespace", "test", "Namespace to create the resource")
flagset.StringVar(&kinds, "kinds", "", "comma separated string which takes resource kinds to be created")
flagset.Float64Var(&clientRateLimitQPS, "clientRateLimitQPS", qps, "Configure the maximum QPS to the Kubernetes API server from Kyverno. Uses the client default if zero.")
flagset.IntVar(&clientRateLimitBurst, "clientRateLimitBurst", burst, "Configure the maximum burst for throttle. Uses the client default if zero.")
flagset.IntVar(&replicas, "replicas", 50, "Configure the replica number of the resource to be created")
flagset.IntVar(&count, "count", 50, "Configure the total number of the resource to be created")
flagset.BoolVar(&delete, "delete", false, "clean up resources")
flagset.VisitAll(func(f *flag.Flag) {
flag.CommandLine.Var(f.Value, f.Name, f.Usage)
})
flag.Parse()
clientConfig, err := clientcmd.BuildConfigFromFlags("", kubeconfig)
if err != nil {
fmt.Println("error creating client config: ", err)
os.Exit(1)
}
clientConfig.Burst = clientRateLimitBurst
clientConfig.QPS = float32(clientRateLimitQPS)
client, err := kubernetes.NewForConfig(clientConfig)
if err != nil {
fmt.Println("error creating client set: ", err)
os.Exit(1)
}
resourceKinds := strings.Split(kinds, ",")
for _, kind := range resourceKinds {
switch kind {
case "pods":
if delete {
if err := client.CoreV1().Pods(namespace).DeleteCollection(context.TODO(), metav1.DeleteOptions{}, metav1.ListOptions{}); err != nil {
fmt.Println("failed to delete the collection of pods: ", err)
os.Exit(1)
}
os.Exit(0)
}
var wg sync.WaitGroup
for i := 0; i < count; i++ {
num := strconv.Itoa(i)
wg.Add(1)
go func(num string, wg *sync.WaitGroup) {
pod := newPod(num)
_, err = client.CoreV1().Pods(namespace).Create(context.TODO(), pod, metav1.CreateOptions{})
if err != nil {
fmt.Println("failed to create the pod: ", err)
// os.Exit(1)
}
wg.Done()
}(num, &wg)
fmt.Printf("created pod perf-testing-pod-%v\n", num)
}
wg.Wait()
case "replicasets":
if delete {
if err := client.AppsV1().ReplicaSets(namespace).DeleteCollection(context.TODO(), metav1.DeleteOptions{}, metav1.ListOptions{}); err != nil {
fmt.Println("failed to delete the collection of replicasets: ", err)
os.Exit(1)
}
os.Exit(0)
}
for i := 0; i < count; i++ {
num := strconv.Itoa(i)
rs := newReplicaset(num)
_, err = client.AppsV1().ReplicaSets(namespace).Create(context.TODO(), rs, metav1.CreateOptions{})
if err != nil {
fmt.Println("failed to create the replicaset: ", err)
os.Exit(1)
}
fmt.Printf("created replicaset perf-testing-rs-%v\n", num)
}
case "deployments":
if delete {
if err := client.AppsV1().Deployments(namespace).DeleteCollection(context.TODO(), metav1.DeleteOptions{}, metav1.ListOptions{}); err != nil {
fmt.Println("failed to delete the collection of deployments: ", err)
os.Exit(1)
}
os.Exit(0)
}
for i := 0; i < count; i++ {
num := strconv.Itoa(i)
deploy := newDeployment(num)
_, err = client.AppsV1().Deployments(namespace).Create(context.TODO(), deploy, metav1.CreateOptions{})
if err != nil {
fmt.Println("failed to create the deployment: ", err)
os.Exit(1)
}
fmt.Printf("created deployment perf-testing-deploy-%v\n", num)
}
}
}
}
func newPod(i string) *corev1.Pod {
return &corev1.Pod{
ObjectMeta: metav1.ObjectMeta{
Name: "perf-testing-pod-" + i,
Namespace: namespace,
Labels: map[string]string{
"app.kubernetes.io/name": "perf-testing",
},
},
Spec: newPodSpec(),
}
}
func newReplicaset(i string) *appsv1.ReplicaSet {
r := int32(replicas)
return &appsv1.ReplicaSet{
ObjectMeta: metav1.ObjectMeta{
Name: "perf-testing-rs" + i,
Namespace: namespace,
Labels: map[string]string{
"app.kubernetes.io/name": "perf-testing",
},
},
Spec: appsv1.ReplicaSetSpec{
Replicas: &r,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"app.kubernetes.io/name": "perf-testing",
},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"app.kubernetes.io/name": "perf-testing",
},
},
Spec: newPodSpec(),
},
},
}
}
func newDeployment(i string) *appsv1.Deployment {
r := int32(replicas)
return &appsv1.Deployment{
ObjectMeta: metav1.ObjectMeta{
Name: "perf-testing-deploy-" + i,
Namespace: namespace,
Labels: map[string]string{
"app.kubernetes.io/name": "perf-testing",
},
},
Spec: appsv1.DeploymentSpec{
Replicas: &r,
Selector: &metav1.LabelSelector{
MatchLabels: map[string]string{
"app.kubernetes.io/name": "perf-testing",
},
},
Template: corev1.PodTemplateSpec{
ObjectMeta: metav1.ObjectMeta{
Labels: map[string]string{
"app.kubernetes.io/name": "perf-testing",
},
},
Spec: newPodSpec(),
},
},
}
}
func newPodSpec() corev1.PodSpec {
boolTrue := true
boolFalse := false
return corev1.PodSpec{
Containers: []corev1.Container{
{
Name: "nginx",
Image: "nginx",
SecurityContext: &corev1.SecurityContext{
AllowPrivilegeEscalation: &boolFalse,
RunAsNonRoot: &boolTrue,
SeccompProfile: &corev1.SeccompProfile{
Type: corev1.SeccompProfileTypeRuntimeDefault,
},
Capabilities: &corev1.Capabilities{
Drop: []corev1.Capability{"ALL"},
},
},
},
},
Tolerations: []corev1.Toleration{
{
Key: "kwok.x-k8s.io/node",
Operator: corev1.TolerationOpExists,
Effect: corev1.TaintEffectNoSchedule,
},
},
Affinity: &corev1.Affinity{
NodeAffinity: &corev1.NodeAffinity{
RequiredDuringSchedulingIgnoredDuringExecution: &corev1.NodeSelector{
NodeSelectorTerms: []corev1.NodeSelectorTerm{
{
MatchExpressions: []corev1.NodeSelectorRequirement{
{
Key: "type",
Operator: corev1.NodeSelectorOpIn,
Values: []string{"kwok"},
},
},
},
},
},
},
},
}
}

61
docs/perf-testing/node.sh Executable file
View file

@ -0,0 +1,61 @@
#!/bin/bash
export KUBECONFIG=/etc/rancher/k3s/k3s.yaml
# read user input for count
echo "Enter the count:"
read count
# iterate $count number of times
for (( i=1; i<=$count; i++ ))
do
# generate YAML configuration using heredoc with COUNT variable substitution
yaml=$(cat <<-END
apiVersion: v1
kind: Node
metadata:
annotations:
node.alpha.kubernetes.io/ttl: "0"
kwok.x-k8s.io/node: fake
labels:
beta.kubernetes.io/arch: amd64
beta.kubernetes.io/os: linux
kubernetes.io/arch: amd64
kubernetes.io/hostname: kwok-node-$i
kubernetes.io/os: linux
kubernetes.io/role: agent
node-role.kubernetes.io/agent: ""
type: kwok
name: kwok-node-$i
spec:
taints:
- effect: NoSchedule
key: kwok.x-k8s.io/node
value: fake
status:
allocatable:
cpu: 32
memory: 256Gi
pods: 110
capacity:
cpu: 32
memory: 256Gi
pods: 110
nodeInfo:
architecture: amd64
bootID: ""
containerRuntimeVersion: ""
kernelVersion: ""
kubeProxyVersion: fake
kubeletVersion: fake
machineID: ""
operatingSystem: linux
osImage: ""
systemUUID: ""
phase: Running
END
)
# apply the generated configuration to Kubernetes cluster
echo "$yaml" | kubectl apply -f -
done

24
docs/perf-testing/size.sh Executable file
View file

@ -0,0 +1,24 @@
#!/bin/bash
## calculate total size for the given object
# read user input for the resource
echo "Enter the resource to caclutate the size:"
read resource
sum=0
for key in `etcdctl get --prefix --keys-only /registry/$resource`
do
size=`etcdctl get $key --print-value-only | wc -c`
count=`etcdctl get $key --write-out=fields | grep \"Count\" | cut -f2 -d':'`
if [ $count -ne 0 ]; then
versions=`etcdctl get $key --write-out=fields | grep \"Version\" | cut -f2 -d':'`
else
versions=0
fi
total=$(( $size * $versions))
sum=$(( $sum + $total ))
echo $sum $total $size $versions $count $key >> /tmp/etcdkeys.txt
done
echo "The total size for $resource is $sum bytes."