Hao Liang's Blog

Embrace the World with Cloud Native and Open-source

【Kubelet】Practical analysis of Kubernetes node extension resources and Device Plugin

1. Background In kubernetes, the node is abstracted into a resource (resource). Currently, there are five officially defined attributes for the allocable resource size of the node: cpu, memory, ephemeral-storage, hugepages-1Gi, hugepages-2Mi When we create a Pod, the scheduler will determine whether the Pod’s requests (required resources) meet the allocable resources of the current node and determine whether the Pod can run on this node. In many business scenarios, it is impossible to fully describe the resource attributes of a node (such as GPU, network card bandwidth, number of allocable IPs, etc.

【Scheduling】Pod State Scheduling Scheduler plugin for scoring based on node pod status

This is the first scheduler plug-in I contributed to the scheduler-plugin open source project of the Kubernetes sig-scheduling group in 2020. 1. Background Related PR: PR103: Pod State Scheduling Plugin Source code address: Pod State Scheduling The current Kubernetes native scheduler scoring algorithm (Score) does not consider the existing Terminating status Pods on the node. The current Kubernetes native scheduler scoring algorithm (Score) does not consider the existing Nominated status Pods on the node.

【Troubleshooting】Troubleshooting a pod that remains in the nominated state after scheduling failure

1. Problem description The pod test-pod-hgfmk under the test-pod-hgfmk namespace is in the pending state, and the nominated node is the 132.10.134.193 node $ kubectl get po -n test-ns test-pod-hgfmk -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE test-pod-hgfmk 0/1 Pending 0 5m <none> <none> 132.10.134.193 The describe pod event found that scheduling failed due to insufficient cpu and memory resources, but monitoring found that there were many

【Scheduling】Capacity-scheduling Flexible Capacity Quota Scheduler Plugin

Recently, the 2021 North American KubeCon was held online. @denkensk and @yuanchen8911 are active contributors to the scheduler-plugin open source project of the Kubernetes sig-scheduling group. Brought a speech on the Capacity scheduling elastic capacity quota scheduler plug-in. 1. Background Related proposals: KEP9: Capacity scheduling. Source code address: Capacity scheduling The current Kubernetes native ResourceQuota quota mechanism is limited to a single namespace (ResourceQuota resource quota can only be configured for each namespace) When scheduling preemption occurs in a Pod, only the priority of its PriorityClass will be used as the criterion for whether to preempt it.

【Scheduling】Load-aware Load-awareness scheduler plugin

1. Background Related proposals: KEP61: Real Load Aware Scheduling. Source code address: Trimaran: Real Load Aware Scheduling The current Kubernetes native scheduling logic based on Pod Request and node Allocatable cannot truly reflect the real load of cluster nodes, so this scheduler plug-in takes the real load of nodes into the Pod scheduling logic. The core component of this plug-in Load Watcher comes from the open source project of paypal company.

【Troubleshooting】Analysis of a kube-scheduler scheduling failure problem

1. Background Recently, business Pod scheduling failures often occur online. Looking at the cluster monitoring, the resources of the cluster are indeed relatively tight, but there are still some nodes with sufficient resources. For example, the request value of the business Pod is set to: resources: limits: cpu: "36" memory: 100Gi requests: cpu: "18" memory: 10Gi There are nodes with idle resources in the cluster: Pod Event reported that there are not enough resources to schedule: 2.

Dockershim deprecated in Kubernetes 1.20

1. Background Recently, Kubernetes version 1.20 was released. Looking at this version of CHANGELOG, we found that Kubernetes will be deprecated after version 1.20. Use Dockershim as the standard implementation of the Container Runtime Interface (CRI). We know that Dockershim is an implementation of the Container Runtime Interface (CRI). The specification of the CRI interface was first introduced in Kubernetes version 1.5: [https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/](https://kubernetes.io/ blog/2016/12/container-runtime-interface-cri-in-kubernetes/). Its purpose is to adapt to multiple container runtimes and allow all systems that manage the container life cycle to implement this unified standard interface (container viewing, creation, deletion, update, etc.

【Scheduling】From kube-scheduler Extender to kube-scheduler Framework

I. Introduction Friends who follow the Kubernetes Scheduler SIG (Special Interest Group) should know that the recently released Kubernetes In version 1.19, Scheduler Framework replaces the original Schduler working mode and officially provides the scheduler to users in a plug-in form. Compared with the “four-piece set” of the old version of the scheduler: Predicate, Priority, Bind, and Preemption. The new version of the scheduler framework is more flexible and introduces a total of 11 extension points.

Large-scale Kubernetes cluster optimization, Pod diagnosis and cluster automated inspection tool design ideas

Preface With the advancement of application containerization, the scale of clusters continues to expand, and more and more businesses are running in the clusters. Cluster operation and maintenance and business developers will face various challenges. The most common and tedious problem is to locate and troubleshoot various problems encountered during the deployment and operation of business Pods. 1. The necessity of Pod diagnosis Various abnormal states will occur during the life cycle of creation, operation, deletion and destruction of Pods in the cluster.

【Scheduling】Priority and preemption mechanism, affinity scheduling, in-tree scheduling algorithm (new features in version 1.19)

1. Priority and preemption mechanism During the scheduling process, Kube-scheduler takes out the Pod from the scheduling queue (SchedulingQueue) each time and performs one round of scheduling. So in what order are the Pods in the scheduling queue added to the queue? The Pod resource object supports setting the Priority attribute. Through different priorities, Pods with high priority are placed in front of the scheduling queue and scheduled first. If the scheduling of a Pod with a high priority fails and no suitable node is found, it will be placed in the UnschedulableQueue and enter the preemption phase.