Hao Liang's Blog

Embrace the World with Cloud Native and Open-source

【Troubleshooting】Reusable CPUs from initContainer were not being honored

1. Description In early version of kubernetes v1.18. Related Commit: Fix a bug whereby reusable CPUs and devices were not being honored #93289 Related PR: Fix a bug whereby reusable CPUs and devices were not being honored #93189 Refactor the algorithm used to decide CPU assignments in the CPUManager #102014 Previously, it was possible for reusable CPUs and reusable devices (i.e. those previously consumed by init containers) to not be reused by subsequent init containers or app containers if the TopologyManager was enabled.

【Troubleshooting】Summary of kube-apiserver troubleshooting techniques (analysis of logs and caching principles)

1. Related background When troubleshooting apiserver issues, we found nodes that may have performance bottlenecks through monitoring. The next step is to further analyze the apiserver logs on the nodes. 2. APIServer log analysis skills Trace log Log printing conditions When the total request time exceeds the threshold (default 500ms), apiserver will print the trace log, and at each step of the trace, it will calculate a step time-consuming threshold

【Troubleshooting】Summary of kube-apiserver troubleshooting techniques (monitoring analysis)

1. Related background As the scale of a single K8s cluster continues to expand (the number of nodes reaches 4,000+), we found during the operation that the apiserver has gradually become the performance bottleneck of the cluster, prone to problems such as unresponsive requests, slow responses, and request rejections, and even causes cluster avalanches, causing Network failure found. The following details how to quickly locate and troubleshoot apiserver performance issues.

Differences in container runtime between different versions of docker

Reference articles: K8s will eventually abandon docker, and TKE already supports containerd Using docker as image building service in containerd cluster 1. Background When comparing Kubernetes clusters using different docker versions (1.18, 1.19) as container runs, we found some differences in the underlying implementation. I will make a record here. 2. Issue analysis docker 1.18 Container process tree: containerd is not a system service, but a process started by dockerd

【ETCD】Analysis of the underlying mechanism of ETCD Defrag

1. Related source code server/storage/backend/backend.go#defrag() server/storage/backend/backend.go#defragdb() 2. Why do we need defrag When we use K8s clusters daily, if we frequently add or delete cluster data, we will find a strange phenomenon: Even though the amount of object data in the cluster has not increased significantly, the disk space occupied by etcd data files is increasing. At this time, check the relevant information. etcd officially recommends using the defrag command of the provided etcdctl tool to defragment the data of each etcd node:

【Troubleshooting】A large number of pending high-priority Pods in the cluster affect the scheduling of low-priority Pods

1. Background Related issues: low priority pods stuck in pending without any scheduling events #106546 Totally avoid Pod starvation (HOL blocking) or clarify the user expectation on the wiki #86373 Related optimization proposal: Efficient requeueing of Unschedulable Pods 2. Issue analysis There are a large number of high-priority Pods in the Pending state in the cluster because the current cluster resources do not meet the resource requests of these high-priority

【Scheduling】Co-Scheduling Grouped Batch Pod Scheduler Plugin

1. Background Related proposals: Kep: Coscheduling based on PodGroup CRD Source code address: Coscheduling In some scenarios (batch-running businesses such as Spark jobs and TensorFlow jobs), a group of Pods need to be scheduled successfully at the same time before they can run normally. Some Pods still cannot run normally after being scheduled successfully. The current Kubernetes native scheduler cannot ensure that a group of Pods is created before scheduling is started.

【Kubelet】Practical analysis of Kubernetes node extension resources and Device Plugin

1. Background In kubernetes, the node is abstracted into a resource (resource). Currently, there are five officially defined attributes for the allocable resource size of the node: cpu, memory, ephemeral-storage, hugepages-1Gi, hugepages-2Mi When we create a Pod, the scheduler will determine whether the Pod’s requests (required resources) meet the allocable resources of the current node and determine whether the Pod can run on this node. In many business scenarios, it is impossible to fully describe the resource attributes of a node (such as GPU, network card bandwidth, number of allocable IPs, etc.

【Scheduling】Pod State Scheduling Scheduler plugin for scoring based on node pod status

This is the first scheduler plug-in I contributed to the scheduler-plugin open source project of the Kubernetes sig-scheduling group in 2020. 1. Background Related PR: PR103: Pod State Scheduling Plugin Source code address: Pod State Scheduling The current Kubernetes native scheduler scoring algorithm (Score) does not consider the existing Terminating status Pods on the node. The current Kubernetes native scheduler scoring algorithm (Score) does not consider the existing Nominated status Pods on the node.

【Troubleshooting】Troubleshooting a pod that remains in the nominated state after scheduling failure

1. Problem description The pod test-pod-hgfmk under the test-pod-hgfmk namespace is in the pending state, and the nominated node is the 132.10.134.193 node $ kubectl get po -n test-ns test-pod-hgfmk -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE test-pod-hgfmk 0/1 Pending 0 5m <none> <none> 132.10.134.193 The describe pod event found that scheduling failed due to insufficient cpu and memory resources, but monitoring found that there were many