killercoda CKA:Architecture, Installation & Maintenance

killercoda CKA:Architecture, Installation & Maintenance

1. Architecture, Installation & Maintenance - Create Pod

Create a pod called sleep-pod using the nginx image and also sleep (using command ) for give any value for seconds.

# @author D瓜哥 · https://www.diguage.com

$ cat nginx.yaml
apiVersion: v1
kind: Pod
metadata:
  name: sleep-pod
spec:
  containers:
  - name: nginx
    image: nginx
    command:
      - sleep
      - "3600"

$ kubectl apply -f nginx.yaml
pod/sleep-pod created

$ kubectl get pod
NAME        READY   STATUS    RESTARTS   AGE
sleep-pod   1/1     Running   0          5s

2. Architecture, Installation & Maintenance - Log Reader

log-reader-pod pod is running, save All pod logs in podalllogs.txt

# @author D瓜哥 · https://www.diguage.com

$ kubectl get pod
NAME             READY   STATUS    RESTARTS   AGE
log-reader-pod   1/1     Running   0          38s

$ kubectl logs log-reader-pod | tee podalllogs.txt
ERROR: Mon Oct 1 10:30:23 UTC 2023 Log in Error!
INFO: Mon Oct 1 10:30:23 UTC 2023 Logged In
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!
INFO: Mon Oct 1 10:30:23 UTC 2023 Logged In
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!
INFO: Mon Oct 1 10:30:23 UTC 2023 Logged In
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!
INFO: Mon Oct 1 10:30:23 UTC 2023 Logged In
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!

3. Architecture, Installation & Maintenance - Log Reader - 1

application-pod pod is running, save All ERROR’s pod logs only in poderrorlogs.txt

# @author D瓜哥 · https://www.diguage.com

$ kubectl logs application-pod | grep ERROR | tee poderrorlogs.txt
ERROR: Mon Oct 1 10:30:23 UTC 2023 Log in Error!
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!

4. Architecture, Installation & Maintenance - Log Reader - 2

alpine-reader-pod pod is running, save All INFO and ERROR’s pod logs in podlogs.txt

# @author D瓜哥 · https://www.diguage.com

$ kubectl logs alpine-reader-pod | grep "INFO\|ERROR" | tee podlogs.txt
ERROR: Mon Oct 1 10:30:23 UTC 2023 Log in Error!
INFO: Mon Oct 1 10:30:23 UTC 2023 Logged In
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!
INFO: Mon Oct 1 10:30:23 UTC 2023 Logged In
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!
INFO: Mon Oct 1 10:30:23 UTC 2023 Logged In
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!
INFO: Mon Oct 1 10:30:23 UTC 2023 Logged In
ERROR: Mon Oct 1 10:30:23 UTC 2023  Log in Error!

5. Architecture, Installation & Maintenance - Pod Log

Create a Kubernetes Pod configuration to facilitate real-time monitoring of a log file. Specifically, you need to set up a Pod named alpine-pod-pod that runs an Alpine Linux container.

Requirements:

  • Name the Pod alpine-pod-pod

  • Use alpine:latest image

  • Container name alpine-container

  • Configure the container to execute the tail -f /config/log.txt command(using args ) with /bin/sh (using command ) to continuously monitor and display the contents of a log file.

  • Set up a volume named config-volume that maps to a ConfigMap named log-configmap , this log-configmap already available.

  • Ensure the Pod has a restart policy of Never .

# @author D瓜哥 · https://www.diguage.com

$ kubectl get configmaps log-configmap -o yaml
apiVersion: v1
data:
  log.txt: |
    <LOG DATA>
kind: ConfigMap
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","data":{"log.txt":"\u003cLOG DATA\u003e\n"},"kind":"ConfigMap","metadata":{"annotations":{},"name":"log-configmap","namespace":"default"}}
  creationTimestamp: "2025-01-08T12:42:48Z"
  name: log-configmap
  namespace: default
  resourceVersion: "1940"
  uid: e2e58b09-e5ec-4c6a-ad0b-0bf3ae48eafd

$ cat pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: alpine-pod-pod
spec:
  containers:
  - name: alpine-container
    image: alpine:latest
    command: ["/bin/sh"]
    args: ["-c", "tail -f /config/log.txt"]
    volumeMounts:
    - name: config-volume
      mountPath: "/config"
      readOnly: true
  volumes:
  - name: config-volume
    configMap:
      name: log-configmap
  restartPolicy: Never

$ kubectl apply -f pod.yaml
pod/alpine-pod-pod created

$ kubectl get pod
NAME             READY   STATUS    RESTARTS   AGE
alpine-pod-pod   1/1     Running   0          6s

$ kubectl exec -it alpine-pod-pod  -- sh

$ cd /config

$ ls -lh
total 0
lrwxrwxrwx    1 root     root          14 Jan  8 12:52 log.txt -> ..data/log.txt

$ cat log.txt
<LOG DATA>
所有要求都满足,但是检查没有通过!奇怪!

6. Architecture, Installation & Maintenance - Pod Logs - 1

product pod is running. when you access logs of this pod, it displays the output Mi Tv Is Good

Please update the pod definition file to utilize an environment variable with the value Sony Tv Is Good Then, recreate this pod with the modified configuration.

# @author D瓜哥 · https://www.diguage.com

$ kubectl get pod product -o yaml | tee pod.yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 419f3fba07847d2b2b4f9ab6e2e30d11df1f539cec9719e5e57fd526b0e33088
    cni.projectcalico.org/podIP: 192.168.1.4/32
    cni.projectcalico.org/podIPs: 192.168.1.4/32
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"product","namespace":"default"},"spec":{"containers":[{"command":["sh","-c","echo 'Mi Tv Is Good' \u0026\u0026 sleep 3600"],"image":"busybox","name":"product-container"}]}}
  creationTimestamp: "2025-01-09T09:09:36Z"
  name: product
  namespace: default
  resourceVersion: "2092"
  uid: db157824-54a5-4c59-bf74-8e5b54b81ad9
spec:
  containers:
  - command:
    - sh
    - -c
    - echo 'Mi Tv Is Good' && sleep 3600
    image: busybox
    imagePullPolicy: Always
    name: product-container
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2nw7d
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: node01
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-2nw7d
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T09:09:39Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T09:09:36Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T09:09:39Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T09:09:39Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T09:09:36Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://77a8ca54c4a7a075d76d77e334fa632d840382a03150bf63dccef8abbbea0e4c
    image: docker.io/library/busybox:latest
    imageID: docker.io/library/busybox@sha256:2919d0172f7524b2d8df9e50066a682669e6d170ac0f6a49676d54358fe970b5
    lastState: {}
    name: product-container
    ready: true
    restartCount: 0
    started: true
    state:
      running:
        startedAt: "2025-01-09T09:09:38Z"
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-2nw7d
      readOnly: true
      recursiveReadOnly: Disabled
  hostIP: 172.30.2.2
  hostIPs:
  - ip: 172.30.2.2
  phase: Running
  podIP: 192.168.1.4
  podIPs:
  - ip: 192.168.1.4
  qosClass: BestEffort
  startTime: "2025-01-09T09:09:36Z"

$ vim pod.yaml
# 在这里,把 pod.yaml 中的 Mi 换成 Sony

$ kubectl replace -f pod.yaml
The Pod "product" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
  core.PodSpec{
        Volumes:        {{Name: "kube-api-access-2nw7d", VolumeSource: {Projected: &{Sources: {{ServiceAccountToken: &{ExpirationSeconds: 3607, Path: "token"}}, {ConfigMap: &{LocalObjectReference: {Name: "kube-root-ca.crt"}, Items: {{Key: "ca.crt", Path: "ca.crt"}}}}, {DownwardAPI: &{Items: {{Path: "namespace", FieldRef: &{APIVersion: "v1", FieldPath: "metadata.namespace"}}}}}}, DefaultMode: &420}}}},
        InitContainers: nil,
        Containers: []core.Container{
                {
                        Name:  "product-container",
                        Image: "busybox",
                        Command: []string{
                                "sh",
                                "-c",
                                strings.Join({
                                        "echo '",
-                                       "Mi",
+                                       "Sony",
                                        " Tv Is Good' && sleep 3600",
                                }, ""),
                        },
                        Args:       nil,
                        WorkingDir: "",
                        ... // 19 identical fields
                },
        },
        EphemeralContainers: nil,
        RestartPolicy:       "Always",
        ... // 28 identical fields
  }

$ kubectl delete -f pod.yaml --force --grace-period 0
Warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "product" force deleted

$ kubectl apply -f pod.yaml
pod/product created

$ kubectl logs product
Sony Tv Is Good

7. Architecture, Installation & Maintenance - Pod Resource

Find the pod that consumes the most CPU in all namespace(including kube-system) in all cluster(currently we have single cluster). Then, store the result in the file high_cpu_pod.txt with the following format: pod_name,namespace .

# @author D瓜哥 · https://www.diguage.com

$ kubectl top pod -A --sort-by cpu
NAMESPACE            NAME                                      CPU(cores)   MEMORY(bytes)
kube-system          kube-apiserver-controlplane               24m          239Mi
kube-system          canal-zstf2                               17m          115Mi
kube-system          etcd-controlplane                         14m          47Mi
kube-system          canal-mfc56                               13m          106Mi
kube-system          kube-controller-manager-controlplane      9m           58Mi
default              redis                                     4m           3Mi
kube-system          metrics-server-75774965fd-rdhd4           3m           14Mi
kube-system          calico-kube-controllers-94fb6bc47-4wx95   2m           27Mi
kube-system          kube-scheduler-controlplane               2m           26Mi
default              httpd                                     1m           6Mi
kube-system          coredns-57888bfdc7-6sqfr                  1m           26Mi
kube-system          coredns-57888bfdc7-jnrx9                  1m           18Mi
kube-system          kube-proxy-sqc72                          1m           20Mi
kube-system          kube-proxy-xknck                          1m           33Mi
local-path-storage   local-path-provisioner-6c5cff8948-tmf26   1m           14Mi
default              nginx                                     0m           2Mi

$ kubectl top pod -A --sort-by cpu --no-headers | head -n 1 | awk '{print $2","$1}'
kube-apiserver-controlplane,kube-system

$ kubectl top pod -A --sort-by cpu --no-headers | head -n 1 | awk '{print $2","$1}' | tee high_cpu_pod.txt
kube-apiserver-controlplane,kube-system
# 如果在输出文件中,需要加标题,则可以使用
# awk  'BEGIN{ printf "pod_name,namespace\n" } {print $2","$1}'

8. Architecture, Installation & Maintenance - Pod filter

you have a script named pod-filter.sh . Update this script to include a command that filters and displays the value of application of a pod named nginx-pod using jsonpath only.

It should be in the format kubectl get pod <pod-name> <remainingcmd>

# @author D瓜哥 · https://www.diguage.com

$ cat pod-filter.sh
#!/bin/bash

$ kubectl get pod
NAME        READY   STATUS    RESTARTS   AGE
nginx-pod   1/1     Running   0          45s

$ kubectl get pod nginx-pod -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    cni.projectcalico.org/containerID: 0529c074320ef685ed7df2326781676829fbccd2f3c1bbacb5ae7ce94e5bd42d
    cni.projectcalico.org/podIP: 192.168.1.4/32
    cni.projectcalico.org/podIPs: 192.168.1.4/32
  creationTimestamp: "2025-01-09T09:23:43Z"
  labels:
    application: frontend
  name: nginx-pod
  namespace: default
  resourceVersion: "2000"
  uid: 32c260ba-081a-4b4c-85bd-10670fde7f15
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
# 省略无用输出

# 在 pod-filter.sh 中增加所需内容

$ cat pod-filter.sh
#!/bin/bash

kubectl get pod nginx-pod -o jsonpath='{.metadata.labels.application}'

$ bash pod-filter.sh
frontend

9. Architecture, Installation & Maintenance - Secret

Create a Kubernetes Secret named database-app-secret in the default namespace using the contents of the file database-data.txt

# @author D瓜哥 · https://www.diguage.com

$ cat database-data.txt
DB_User=REJfVXNlcj1teXVzZXI=
DB_Password=REJfUGFzc3dvcmQ9bXlwYXNzd29yZA==

$ kubectl create secret generic database-app-secret --from-file database-data.txt
secret/database-app-secret created

10. Architecture, Installation & Maintenance - Secret 1

Decode the contents of the existing secret named database-data in the database-ns namespace and save the decoded content into a file located at decoded.txt

# @author D瓜哥 · https://www.diguage.com

$ kubectl -n database-ns get secrets database-data -o yaml
apiVersion: v1
data:
  DB_PASSWORD: c2VjcmV0
kind: Secret
metadata:
  creationTimestamp: "2025-01-09T09:40:21Z"
  name: database-data
  namespace: database-ns
  resourceVersion: "2280"
  uid: 958a00c4-6776-4621-8d8b-94d6c31f93f9
type: Opaque

$ kubectl -n database-ns get secrets database-data -o jsonpath='{.data.DB_PASSWORD}' | base64 -d
secret

$ kubectl -n database-ns get secrets database-data -o jsonpath='{.data.DB_PASSWORD}' | base64 -d | tee decoded.txt
secret

$ cat decoded.txt
secret

11. Architecture, Installation & Maintenance - Node Resource

Find the Node that consumes the most MEMORY in all cluster(currently we have single cluster). Then, store the result in the file high_memory_node.txt with the following format: current_context,node_name .

# @author D瓜哥 · https://www.diguage.com

$ kubectl top node --sort-by memory
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
controlplane   138m         13%    1266Mi          67%
node01         48m          4%     761Mi           40%

$ echo "$(kubectl config current-context),$(kubectl top node --sort-by memory --no-headers \
  | head -n 1 | awk '{print $1}')" | tee high_memory_node.txt
kubernetes-admin@kubernetes,controlplane

$ cat high_memory_node.txt
kubernetes-admin@kubernetes,controlplane

# 或

$ context=`kubectl config current-context`

$ node=$(kubectl top nodes --sort-by=memory --no-headers | head -n 1 | awk '{print $1}')

$ echo "$context,$node" | tee high_memory_node.txt

12. Architecture, Installation & Maintenance - Service filter

you have a script named svc-filter.sh . Update this script to include a command that filters and displays the value of target port of a service named redis-service using ` ` only.

It should be in the format kubectl get svc OR It should be in the format kubectl get service

# @author D瓜哥 · https://www.diguage.com

$ kubectl get svc redis-service -o yaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2025-01-09T11:36:13Z"
  labels:
    app: redis-service
  name: redis-service
  namespace: default
  resourceVersion: "1950"
  uid: 1ac92e1d-81af-4c6b-b419-178ca1362d85
spec:
  clusterIP: 10.110.149.89
  clusterIPs:
  - 10.110.149.89
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: 6379-6379
    port: 6379
    protocol: TCP
    targetPort: 6379
  selector:
    app: redis-service
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

$ kubectl get svc redis-service -o jsonpath='{.spec.ports[0].targetPort}'
6379

$ cat svc-filter.sh
#!/bin/bash

$ vim svc-filter.sh
# 将上述命令复制到文件中

$ bash svc-filter.sh
6379

13. Architecture, Installation & Maintenance - Service account, cluster role, cluster role binding

You have a service account named group1-sa , a ClusterRole named group1-role-cka , and a ClusterRoleBinding named group1-role-binding-cka . Your task is to update the permissions for the group1-sa service account so that it can only create , get and list the deployments and no other resources in the cluster.

# @author D瓜哥 · https://www.diguage.com

$ kubectl get sa
NAME        SECRETS   AGE
group1-sa   0         41s

$ kubectl get clusterrole
NAME                                                                   CREATED AT
group1-role-cka                                                        2025-01-09T11:44:23Z

$ kubectl get clusterRoleBinding
NAME                                                            ROLE                                                                               AGE
group1-role-binding-cka                                         ClusterRole/group1-role-cka                                                        81s

$ kubectl get clusterrole
NAME                                                                   CREATED AT
group1-role-cka                                                        2025-01-09T11:44:23Z

$ kubectl get clusterrole group1-role-cka
NAME              CREATED AT
group1-role-cka   2025-01-09T11:44:23Z


$ kubectl get clusterrole group1-role-cka  -o yaml | tee role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: "2025-01-09T11:44:23Z"
  name: group1-role-cka
  resourceVersion: "1979"
  uid: f406875b-e377-4c29-b131-420e16079e57
rules:
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - get

$ vim role.yaml
# 增加 create、list 权限

$ kubectl replace -f role.yaml
clusterrole.rbac.authorization.k8s.io/group1-role-cka replaced

$ cat role.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  creationTimestamp: "2025-01-09T11:44:23Z"
  name: group1-role-cka
  resourceVersion: "1979"
  uid: f406875b-e377-4c29-b131-420e16079e57
rules:
- apiGroups:
  - apps
  resources:
  - deployments
  verbs:
  - get
  - create
  - list

14. Architecture, Installation & Maintenance - Service account, cluster role, cluster role binding

Create a service account named app-account , a role named app-role-cka , and a role binding named app-role-binding-cka . Update the permissions of this service account so that it can get the pods only in the default namespace.

# @author D瓜哥 · https://www.diguage.com

$ kubectl create sa app-account
serviceaccount/app-account created

$ kubectl get ns
NAME                 STATUS   AGE
default              Active   7d2h
kube-node-lease      Active   7d2h
kube-public          Active   7d2h
kube-system          Active   7d2h
local-path-storage   Active   7d2h

$ kubectl create role app-role-cka --resource=pods --verb=get --namespace=default
role.rbac.authorization.k8s.io/app-role-cka created

$ kubectl create rolebinding app-role-binding-cka --serviceaccount=app-account --role=app-role-cka
error: serviceaccount must be <namespace>:<name>

$ kubectl create rolebinding app-role-binding-cka --serviceaccount=default:app-account --role=app-role-cka
rolebinding.rbac.authorization.k8s.io/app-role-binding-cka created

15. Architecture, Installation & Maintenance - Cluster Upgrade

Upgrade controlplane node kubeadm , cluster and kubelet to next version.

EXAMPLE: If current version is v1.27.1 then upgrade to v1.27.2

|kubeadm |

BEFORE UPGRADE: ( v1.31.0 )

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"31", GitVersion:"v1.31.0", GitCommit:"9edcffcde5595e8a5b1a35f88c421764e575afce", GitTreeState:"clean", BuildDate:"2024-08-13T07:35:57Z", GoVersion:"go1.22.5", Compiler:"gc", Platform:"linux/amd64"}

AFTER UPGRADE: ( v1.31.1 )

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"31", GitVersion:"v1.31.1", GitCommit:"948afe5ca072329a73c8e79ed5938717a5cb3d21", GitTreeState:"clean", BuildDate:"2024-09-11T21:26:49Z", GoVersion:"go1.22.6", Compiler:"gc", Platform:"linux/amd64"}

|Cluster Upgrade |

BEFORE UPGRADE: ( v1.27.1 )

$ sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: 1.31.0
[upgrade/versions] kubeadm version: v1.31.1

AFTER UPGRADE: ( v1.27.2 )

$ kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: 1.31.1
[upgrade/versions] kubeadm version: v1.31.1

|kubelet Upgrade |

BEFORE UPGRADE: ( v1.31.0 )

$ kubectl get  nodes
NAME           STATUS   ROLES           AGE    VERSION
controlplane   Ready    control-plane   7d2h   v1.31.0
node01         Ready    <none>          7d2h   v1.31.0

AFTER UPGRADE: ( v1.27.2 )

$ kubectl get nodes
NAME           STATUS   ROLES           AGE    VERSION
controlplane   Ready    control-plane   7d2h   v1.31.1
node01         Ready    <none>          7d2h   v1.31.0

Similarly verify upgradation for current verion. ( ex:- v1.31.0 to v1.31.1 )

# @author D瓜哥 · https://www.diguage.com

$ uname -a
Linux controlplane 5.4.0-131-generic #147-Ubuntu SMP Fri Oct 14 17:07:22 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release
No LSB modules are available.

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 20.04.5 LTS
Release:        20.04
Codename:       focal

$ kubectl version
Client Version: v1.31.0
Kustomize Version: v5.4.2
Server Version: v1.31.0

$ sudo apt search kubeadm
Sorting... Done
Full Text Search... Done
kubeadm/unknown 1.31.4-1.1 arm64
  Command-line utility for administering a Kubernetes cluster

$ sudo apt-get install -y  --allow-downgrades kubeadm=1.31.1-1.1 kubelet=1.31.1-1.1 kubectl=1.31.1-1.1
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
  ebtables socat
    Use 'sudo apt autoremove' to remove them.
The following packages will be upgraded:
  kubeadm kubectl kubelet
3 upgraded, 0 newly installed, 0 to remove and 182 not upgraded.
Need to get 37.8 MB of archives.
After this operation, 4096 B of additional disk space will be used.
Get:1 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.31/deb  kubeadm 1.31.1-1.1 [11.4 MB]
Get:2 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.31/deb  kubectl 1.31.1-1.1 [11.2 MB]
Get:3 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.31/deb  kubelet 1.31.1-1.1 [15.2 MB]
Fetched 37.8 MB in 4s (10.5 MB/s)
(Reading database ... 132638 files and directories currently installed.)
Preparing to unpack .../kubeadm_1.31.1-1.1_amd64.deb ...
Unpacking kubeadm (1.31.1-1.1) over (1.31.0-1.1) ...
Preparing to unpack .../kubectl_1.31.1-1.1_amd64.deb ...
Unpacking kubectl (1.31.1-1.1) over (1.31.0-1.1) ...
Preparing to unpack .../kubelet_1.31.1-1.1_amd64.deb ...
Unpacking kubelet (1.31.1-1.1) over (1.31.0-1.1) ...
Setting up kubeadm (1.31.1-1.1) ...
Setting up kubectl (1.31.1-1.1) ...
Setting up kubelet (1.31.1-1.1) ...

$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"31", GitVersion:"v1.31.1", GitCommit:"948afe5ca072329a73c8e79ed5938717a5cb3d21", GitTreeState:"clean", BuildDate:"2024-09-11T21:26:49Z", GoVersion:"go1.22.6", Compiler:"gc", Platform:"linux/amd64"}

$ sudo kubeadm upgrade plan
[preflight] Running pre-flight checks.
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: 1.31.0
[upgrade/versions] kubeadm version: v1.31.1
I0109 12:10:30.924351    7147 version.go:261] remote version is much newer: v1.32.0; falling back to: stable-1.31
[upgrade/versions] Target version: v1.31.4
[upgrade/versions] Latest version in the v1.31 series: v1.31.4

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   NODE           CURRENT   TARGET
kubelet     controlplane   v1.31.0   v1.31.4
kubelet     node01         v1.31.0   v1.31.4

Upgrade to the latest version in the v1.31 series:

COMPONENT                 NODE           CURRENT    TARGET
kube-apiserver            controlplane   v1.31.0    v1.31.4
kube-controller-manager   controlplane   v1.31.0    v1.31.4
kube-scheduler            controlplane   v1.31.0    v1.31.4
kube-proxy                               1.31.0     v1.31.4
CoreDNS                                  v1.11.1    v1.11.3
etcd                      controlplane   3.5.15-0   3.5.15-0

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.31.4

Note: Before you can perform this upgrade, you have to update kubeadm to v1.31.4.

_____________________________________________________________________


The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.

API GROUP                 CURRENT VERSION   PREFERRED VERSION   MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io   v1alpha1          v1alpha1            no
kubelet.config.k8s.io     v1beta1           v1beta1             no
_____________________________________________________________________

$ kubeadm upgrade apply v1.31.1
[preflight] Running pre-flight checks.
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.31.1"
[upgrade/versions] Cluster version: v1.31.0
[upgrade/versions] kubeadm version: v1.31.1
[upgrade] Are you sure you want to proceed? [y/N]: y
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action beforehand using 'kubeadm config images pull'
W0109 12:11:41.194446    7787 checks.go:846] detected that the sandbox image "registry.k8s.io/pause:3.5" of the container runtime is inconsistent with that used by kubeadm.It is recommended to use "registry.k8s.io/pause:3.10" as the CRI sandbox image.
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.31.1" (timeout: 5m0s)...
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests1337267299"
[upgrade/staticpods] Preparing for "etcd" upgrade
[upgrade/staticpods] Renewing etcd-server certificate
[upgrade/staticpods] Renewing etcd-peer certificate
[upgrade/staticpods] Renewing etcd-healthcheck-client certificate
[upgrade/staticpods] Moving new manifest to "/etc/kubernetes/manifests/etcd.yaml" and backing up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-01-09-12-12-11/etcd.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This can take up to 5m0s
[apiclient] Found 1 Pods for label selector component=etcd
[upgrade/staticpods] Component "etcd" upgraded successfully!
[upgrade/etcd] Waiting for etcd to become available
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Renewing apiserver-etcd-client certificate
[upgrade/staticpods] Moving new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backing up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-01-09-12-12-11/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This can take up to 5m0s
[apiclient] Found 1 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moving new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backing up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-01-09-12-12-11/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This can take up to 5m0s
[apiclient] Found 1 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moving new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backing up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2025-01-09-12-12-11/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This can take up to 5m0s
[apiclient] Found 1 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
[upgrade] Backing up kubelet config file to /etc/kubernetes/tmp/kubeadm-kubelet-config4178983551/config.yaml
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.31.1". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

16. Architecture, Installation & Maintenance - ETCD Backup

etcd-controlplane pod is running in kube-system environment, take backup and store it in /opt/cluster_backup.db file, and also store backup console output store it in backup.txt

ssh controlplane

# @author D瓜哥 · https://www.diguage.com

$ ssh controlplane
Last login: Sun Nov 13 17:27:09 2022 from 10.48.0.33

$ kubectl -n kube-system get pods
NAME                                      READY   STATUS    RESTARTS        AGE
calico-kube-controllers-94fb6bc47-4wx95   1/1     Running   2 (3m55s ago)   7d2h
canal-mfc56                               2/2     Running   2 (3m59s ago)   7d2h
canal-zstf2                               2/2     Running   2 (3m55s ago)   7d2h
coredns-57888bfdc7-6sqfr                  1/1     Running   1 (3m59s ago)   7d2h
coredns-57888bfdc7-jnrx9                  1/1     Running   1 (3m59s ago)   7d2h
etcd-controlplane                         1/1     Running   2 (3m55s ago)   7d2h
kube-apiserver-controlplane               1/1     Running   2 (3m55s ago)   7d2h
kube-controller-manager-controlplane      1/1     Running   2 (3m55s ago)   7d2h
kube-proxy-sqc72                          1/1     Running   2 (3m55s ago)   7d2h
kube-proxy-xknck                          1/1     Running   1 (3m59s ago)   7d2h
kube-scheduler-controlplane               1/1     Running   2 (3m55s ago)   7d2h

$ kubectl -n kube-system get pod etcd-controlplane -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.30.1.2:2379
    kubernetes.io/config.hash: 4fb3015641784f175e793600c1e22e8c
    kubernetes.io/config.mirror: 4fb3015641784f175e793600c1e22e8c
    kubernetes.io/config.seen: "2025-01-02T09:49:15.967125433Z"
    kubernetes.io/config.source: file
  creationTimestamp: "2025-01-02T09:49:47Z"
  labels:
    component: etcd
    tier: control-plane
  name: etcd-controlplane
  namespace: kube-system
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Node
    name: controlplane
    uid: 3128acc2-f3b1-4321-829a-338be43290e3
  resourceVersion: "1802"
  uid: 5b796aeb-a01d-43e4-abd5-3a3ac06021a7
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.30.1.2:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://172.30.1.2:2380
    - --initial-cluster=controlplane=https://172.30.1.2:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://172.30.1.2:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.30.1.2:2380
    - --name=controlplane
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.15-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /livez
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    name: etcd
    readinessProbe:
      failureThreshold: 3
      httpGet:
        host: 127.0.0.1
        path: /readyz
        port: 2381
        scheme: HTTP
      periodSeconds: 1
      successThreshold: 1
      timeoutSeconds: 15
    resources:
      requests:
        cpu: 25m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /readyz
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  nodeName: controlplane
  preemptionPolicy: PreemptLowerPriority
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    operator: Exists
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T12:23:27Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T12:23:25Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T12:23:40Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T12:23:40Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T12:23:25Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://5a2fff95bccca8d8705695367c597c23c91a76cea625f51d72c9137154a72169
    image: registry.k8s.io/etcd:3.5.15-0
    imageID: registry.k8s.io/etcd@sha256:a6dc63e6e8cfa0307d7851762fa6b629afb18f28d8aa3fab5a6e91b4af60026a
    lastState:
      terminated:
        containerID: containerd://addfea1cb3148ee798d85cf6149167959835c535d576f0e79bf03e2c5a225032
        exitCode: 255
        finishedAt: "2025-01-09T12:23:13Z"
        reason: Unknown
        startedAt: "2025-01-02T10:00:27Z"
    name: etcd
    ready: true
    restartCount: 2
    started: true
    state:
      running:
        startedAt: "2025-01-09T12:23:27Z"
  hostIP: 172.30.1.2
  hostIPs:
  - ip: 172.30.1.2
  phase: Running
  podIP: 172.30.1.2
  podIPs:
  - ip: 172.30.1.2
  qosClass: Burstable
  startTime: "2025-01-09T12:23:25Z"

# 1. 进入 Pod,执行备份操作
$ kubectl -n kube-system exec -it etcd-controlplane -- sh

# 这里执行失败:一直卡着,没有进行备份。
pod$ ETCDCTL_API=3 etcdctl snapshot save /cluster_backup.db \
> --endpoints=127.0.0.1:2379 \
> --cert=/etc/kubernetes/pki/etcd/server.crt \
> --key=/etc/kubernetes/pki/etcd/server.key \
> > /backup.txt
{"level":"info","ts":"2025-01-09T13:00:26.611059Z","caller":"snapshot/v3_snapshot.go:65","msg":"created temporary db file","path":"/cluster_backup.db.part"}

# 2. 将所需文件拷贝下来
$ kubectl cp kube-system/etcd-controlplane:/backup.txt .
error: Internal error occurred: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "e70a4bbddc7672ab8ca25f05116c370f87528c2d39ef839478d9029a010d552f": OCI runtime exec failed: exec failed: unable to start container process: exec: "tar": executable file not found in $PATH: unknown


== 第二个方案 =============================================================

$ pwd
/etc/kubernetes/pki/etcd

$ ll
total 40
drwxr-xr-x 2 root root 4096 Jan  2 09:48 ./
drwxr-xr-x 3 root root 4096 Jan  2 09:48 ../
-rw-r--r-- 1 root root 1094 Jan  2 09:48 ca.crt
-rw------- 1 root root 1675 Jan  2 09:48 ca.key
-rw-r--r-- 1 root root 1123 Jan  2 09:48 healthcheck-client.crt
-rw------- 1 root root 1679 Jan  2 09:48 healthcheck-client.key
-rw-r--r-- 1 root root 1208 Jan  2 09:48 peer.crt
-rw------- 1 root root 1679 Jan  2 09:48 peer.key
-rw-r--r-- 1 root root 1208 Jan  2 09:48 server.crt
-rw------- 1 root root 1675 Jan  2 09:48 server.key

$ kubectl -n kube-system get pod -o wide
NAME                                      READY   STATUS    RESTARTS      AGE    IP            NODE           NOMINATED NODE   READINESS GATES
calico-kube-controllers-94fb6bc47-4wx95   1/1     Running   2 (41m ago)   7d3h   192.168.0.2   controlplane   <none>           <none>
canal-mfc56                               2/2     Running   2 (41m ago)   7d3h   172.30.2.2    node01         <none>           <none>
canal-zstf2                               2/2     Running   2 (41m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>
coredns-57888bfdc7-6sqfr                  1/1     Running   1 (41m ago)   7d3h   192.168.1.2   node01         <none>           <none>
coredns-57888bfdc7-jnrx9                  1/1     Running   1 (41m ago)   7d3h   192.168.1.3   node01         <none>           <none>
etcd-controlplane                         1/1     Running   2 (41m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>
kube-apiserver-controlplane               1/1     Running   2 (41m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>
kube-controller-manager-controlplane      1/1     Running   2 (41m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>
kube-proxy-sqc72                          1/1     Running   2 (41m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>
kube-proxy-xknck                          1/1     Running   1 (41m ago)   7d3h   172.30.2.2    node01         <none>           <none>
kube-scheduler-controlplane               1/1     Running   2 (41m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:26:da:b2 brd ff:ff:ff:ff:ff:ff
    inet 172.30.1.2/24 brd 172.30.1.255 scope global dynamic enp1s0
       valid_lft 86310838sec preferred_lft 86310838sec


# 这里执行失败:一直卡着,没有进行备份。
$ ETCDCTL_API=3 etcdctl snapshot sav /opt/cluster_backup.db \
> --endpoints=172.30.1.2:2379 \
> --cert=/etc/kubernetes/pki/etcd/server.crt \
> --key=/etc/kubernetes/pki/etcd/server.key \
> > ./backup.txt
{"level":"info","ts":1736427985.0269656,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"/opt/cluster_backup.db.part"}


# 或

$ ETCDCTL_API=3 etcdctl snapshot save /opt/cluster_backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key > /opt/backup.txt

$ cat backup.txt
{"level":"info","ts":1734514211.239635,"caller":"snapshot/v3_snapshot.go:68","msg":"created temporary db file","path":"/opt/cluster_backup.db.part"}
{"level":"info","ts":1734514211.2523417,"logger":"client","caller":"v3/maintenance.go:211","msg":"opened snapshot stream; downloading"}
{"level":"info","ts":1734514211.252389,"caller":"snapshot/v3_snapshot.go:76","msg":"fetching snapshot","endpoint":"https://127.0.0.1:2379"}
{"level":"info","ts":1734514211.5606298,"logger":"client","caller":"v3/maintenance.go:219","msg":"completed snapshot read; closing"}
{"level":"info","ts":1734514211.5849128,"caller":"snapshot/v3_snapshot.go:91","msg":"fetched snapshot","endpoint":"https://127.0.0.1:2379","size":"6.0 MB","took":"now"}
{"level":"info","ts":1734514211.5850122,"caller":"snapshot/v3_snapshot.go:100","msg":"saved","path":"/opt/cluster_backup.db"}
Snapshot saved at /opt/cluster_backup.db
没有成功!失败原因不详细!

17. Architecture, Installation & Maintenance - ETCD Restore

etcd-controlplane pod is running in kube-system environment, take backup and store it in /opt/cluster_backup.db file.

ETCD backup is stored at the path /opt/cluster_backup.db on the controlplane node. for --data-dir use /root/default.etcd , restore it on the controlplane node itself and , and also store restore console output store it in restore.txt

ssh controlplane

# @author D瓜哥 · https://www.diguage.com

$ kubectl -n kube-system get pod -o wide
NAME                                      READY   STATUS    RESTARTS      AGE    IP            NODE           NOMINATED NODE   READINESS GATES
calico-kube-controllers-94fb6bc47-4wx95   1/1     Running   2 (14m ago)   7d3h   192.168.0.2   controlplane   <none>           <none>
canal-mfc56                               2/2     Running   2 (14m ago)   7d3h   172.30.2.2    node01         <none>           <none>
canal-zstf2                               2/2     Running   2 (14m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>
coredns-57888bfdc7-6sqfr                  1/1     Running   1 (14m ago)   7d3h   192.168.1.2   node01         <none>           <none>
coredns-57888bfdc7-jnrx9                  1/1     Running   1 (14m ago)   7d3h   192.168.1.3   node01         <none>           <none>
etcd-controlplane                         1/1     Running   2 (14m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>
kube-apiserver-controlplane               1/1     Running   2 (14m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>
kube-controller-manager-controlplane      1/1     Running   2 (14m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>
kube-proxy-sqc72                          1/1     Running   2 (14m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>
kube-proxy-xknck                          1/1     Running   1 (14m ago)   7d3h   172.30.2.2    node01         <none>           <none>
kube-scheduler-controlplane               1/1     Running   2 (14m ago)   7d3h   172.30.1.2    controlplane   <none>           <none>

$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 52:54:00:b9:cd:f1 brd ff:ff:ff:ff:ff:ff
    inet 172.30.1.2/24 brd 172.30.1.255 scope global dynamic enp1s0
       valid_lft 86312755sec preferred_lft 86312755sec

$ kubectl -n kube-system get pod etcd-controlplane -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubeadm.kubernetes.io/etcd.advertise-client-urls: https://172.30.1.2:2379
    kubernetes.io/config.hash: 4fb3015641784f175e793600c1e22e8c
    kubernetes.io/config.mirror: 4fb3015641784f175e793600c1e22e8c
    kubernetes.io/config.seen: "2025-01-02T09:49:15.967125433Z"
    kubernetes.io/config.source: file
  creationTimestamp: "2025-01-02T09:49:47Z"
  labels:
    component: etcd
    tier: control-plane
  name: etcd-controlplane
  namespace: kube-system
  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: Node
    name: controlplane
    uid: 3128acc2-f3b1-4321-829a-338be43290e3
  resourceVersion: "1792"
  uid: 5b796aeb-a01d-43e4-abd5-3a3ac06021a7
spec:
  containers:
  - command:
    - etcd
    - --advertise-client-urls=https://172.30.1.2:2379
    - --cert-file=/etc/kubernetes/pki/etcd/server.crt
    - --client-cert-auth=true
    - --data-dir=/var/lib/etcd
    - --experimental-initial-corrupt-check=true
    - --experimental-watch-progress-notify-interval=5s
    - --initial-advertise-peer-urls=https://172.30.1.2:2380
    - --initial-cluster=controlplane=https://172.30.1.2:2380
    - --key-file=/etc/kubernetes/pki/etcd/server.key
    - --listen-client-urls=https://127.0.0.1:2379,https://172.30.1.2:2379
    - --listen-metrics-urls=http://127.0.0.1:2381
    - --listen-peer-urls=https://172.30.1.2:2380
    - --name=controlplane
    - --peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
    - --peer-client-cert-auth=true
    - --peer-key-file=/etc/kubernetes/pki/etcd/peer.key
    - --peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    - --snapshot-count=10000
    - --trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
    image: registry.k8s.io/etcd:3.5.15-0
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 8
      httpGet:
        host: 127.0.0.1
        path: /livez
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    name: etcd
    readinessProbe:
      failureThreshold: 3
      httpGet:
        host: 127.0.0.1
        path: /readyz
        port: 2381
        scheme: HTTP
      periodSeconds: 1
      successThreshold: 1
      timeoutSeconds: 15
    resources:
      requests:
        cpu: 25m
        memory: 100Mi
    startupProbe:
      failureThreshold: 24
      httpGet:
        host: 127.0.0.1
        path: /readyz
        port: 2381
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 15
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/lib/etcd
      name: etcd-data
    - mountPath: /etc/kubernetes/pki/etcd
      name: etcd-certs
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostNetwork: true
  nodeName: controlplane
  preemptionPolicy: PreemptLowerPriority
  priority: 2000001000
  priorityClassName: system-node-critical
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seccompProfile:
      type: RuntimeDefault
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    operator: Exists
  volumes:
  - hostPath:
      path: /etc/kubernetes/pki/etcd
      type: DirectoryOrCreate
    name: etcd-certs
  - hostPath:
      path: /var/lib/etcd
      type: DirectoryOrCreate
    name: etcd-data
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T12:59:51Z"
    status: "True"
    type: PodReadyToStartContainers
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T12:59:48Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T13:00:03Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T13:00:03Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2025-01-09T12:59:48Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: containerd://79da7de0192cbf31a47a57ad1f15ec52eccacb8328b4d94f7c8ab7d5845e7c39
    image: registry.k8s.io/etcd:3.5.15-0
    imageID: registry.k8s.io/etcd@sha256:a6dc63e6e8cfa0307d7851762fa6b629afb18f28d8aa3fab5a6e91b4af60026a
    lastState:
      terminated:
        containerID: containerd://addfea1cb3148ee798d85cf6149167959835c535d576f0e79bf03e2c5a225032
        exitCode: 255
        finishedAt: "2025-01-09T12:59:35Z"
        reason: Unknown
        startedAt: "2025-01-02T10:00:27Z"
    name: etcd
    ready: true
    restartCount: 2
    started: true
    state:
      running:
        startedAt: "2025-01-09T12:59:50Z"
  hostIP: 172.30.1.2
  hostIPs:
  - ip: 172.30.1.2
  phase: Running
  podIP: 172.30.1.2
  podIPs:
  - ip: 172.30.1.2
  qosClass: Burstable
  startTime: "2025-01-09T12:59:48Z"

$ kubectl exec -n kube-system etcd-controlplane -- sh -c "ETCDCTL_API=3 etcdctl snapshot save /opt/cluster_backup.db --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key"
有点不明觉厉!