Hao Liang's Blog

Embrace the World with Cloud Native and Open-source

【Scheduling】Co-Scheduling Grouped Batch Pod Scheduler Plugin

1. Background Related proposals: Kep: Coscheduling based on PodGroup CRD Source code address: Coscheduling In some scenarios (batch-running businesses such as Spark jobs and TensorFlow jobs), a group of Pods need to be scheduled successfully at the same time before they can run normally. Some Pods still cannot run normally after being scheduled successfully. The current Kubernetes native scheduler cannot ensure that a group of Pods is created before scheduling is started.

Posted by Hao Liang's Blog on Sunday, November 21, 2021

【Kubelet】Practical analysis of Kubernetes node extension resources and Device Plugin

1. Background In kubernetes, the node is abstracted into a resource (resource). Currently, there are five officially defined attributes for the allocable resource size of the node: cpu, memory, ephemeral-storage, hugepages-1Gi, hugepages-2Mi When we create a Pod, the scheduler will determine whether the Pod’s requests (required resources) meet the allocable resources of the current node and determine whether the Pod can run on this node. In many business scenarios, it is impossible to fully describe the resource attributes of a node (such as GPU, network card bandwidth, number of allocable IPs, etc.

Posted by Hao Liang's Blog on Sunday, November 7, 2021

【Scheduling】Pod State Scheduling Scheduler plugin for scoring based on node pod status

This is the first scheduler plug-in I contributed to the scheduler-plugin open source project of the Kubernetes sig-scheduling group in 2020. 1. Background Related PR: PR103: Pod State Scheduling Plugin Source code address: Pod State Scheduling The current Kubernetes native scheduler scoring algorithm (Score) does not consider the existing Terminating status Pods on the node. The current Kubernetes native scheduler scoring algorithm (Score) does not consider the existing Nominated status Pods on the node.

Posted by Hao Liang's Blog on Sunday, October 24, 2021

【Troubleshooting】Troubleshooting a pod that remains in the nominated state after scheduling failure

1. Problem description The pod test-pod-hgfmk under the test-pod-hgfmk namespace is in the pending state, and the nominated node is the 132.10.134.193 node $ kubectl get po -n test-ns test-pod-hgfmk -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE test-pod-hgfmk 0/1 Pending 0 5m <none> <none> 132.10.134.193 The describe pod event found that scheduling failed due to insufficient cpu and memory resources, but monitoring found that there were many

Posted by Hao Liang's Blog on Sunday, October 24, 2021

【Scheduling】Capacity-scheduling Flexible Capacity Quota Scheduler Plugin

Recently, the 2021 North American KubeCon was held online. @denkensk and @yuanchen8911 are active contributors to the scheduler-plugin open source project of the Kubernetes sig-scheduling group. Brought a speech on the Capacity scheduling elastic capacity quota scheduler plug-in. 1. Background Related proposals: KEP9: Capacity scheduling. Source code address: Capacity scheduling The current Kubernetes native ResourceQuota quota mechanism is limited to a single namespace (ResourceQuota resource quota can only be configured for each namespace) When scheduling preemption occurs in a Pod, only the priority of its PriorityClass will be used as the criterion for whether to preempt it.

Posted by Hao Liang's Blog on Saturday, October 23, 2021

【Scheduling】Load-aware Load-awareness scheduler plugin

1. Background Related proposals: KEP61: Real Load Aware Scheduling. Source code address: Trimaran: Real Load Aware Scheduling The current Kubernetes native scheduling logic based on Pod Request and node Allocatable cannot truly reflect the real load of cluster nodes, so this scheduler plug-in takes the real load of nodes into the Pod scheduling logic. The core component of this plug-in Load Watcher comes from the open source project of paypal company.

Posted by Hao Liang's Blog on Tuesday, September 21, 2021

【Troubleshooting】Analysis of a kube-scheduler scheduling failure problem

1. Background Recently, business Pod scheduling failures often occur online. Looking at the cluster monitoring, the resources of the cluster are indeed relatively tight, but there are still some nodes with sufficient resources. For example, the request value of the business Pod is set to: resources: limits: cpu: "36" memory: 100Gi requests: cpu: "18" memory: 10Gi There are nodes with idle resources in the cluster: Pod Event reported that there are not enough resources to schedule: 2.

Posted by Hao Liang's Blog on Friday, September 10, 2021

Dockershim deprecated in Kubernetes 1.20

1. Background Recently, Kubernetes version 1.20 was released. Looking at this version of CHANGELOG, we found that Kubernetes will be deprecated after version 1.20. Use Dockershim as the standard implementation of the Container Runtime Interface (CRI). We know that Dockershim is an implementation of the Container Runtime Interface (CRI). The specification of the CRI interface was first introduced in Kubernetes version 1.5: [https://kubernetes.io/blog/2016/12/container-runtime-interface-cri-in-kubernetes/](https://kubernetes.io/ blog/2016/12/container-runtime-interface-cri-in-kubernetes/). Its purpose is to adapt to multiple container runtimes and allow all systems that manage the container life cycle to implement this unified standard interface (container viewing, creation, deletion, update, etc.

Posted by Hao Liang's Blog on Sunday, December 6, 2020

【Scheduling】From kube-scheduler Extender to kube-scheduler Framework

I. Introduction Friends who follow the Kubernetes Scheduler SIG (Special Interest Group) should know that the recently released Kubernetes In version 1.19, Scheduler Framework replaces the original Schduler working mode and officially provides the scheduler to users in a plug-in form. Compared with the “four-piece set” of the old version of the scheduler: Predicate, Priority, Bind, and Preemption. The new version of the scheduler framework is more flexible and introduces a total of 11 extension points.

Posted by Hao Liang's Blog on Friday, November 13, 2020

Large-scale Kubernetes cluster optimization, Pod diagnosis and cluster automated inspection tool design ideas

Preface With the advancement of application containerization, the scale of clusters continues to expand, and more and more businesses are running in the clusters. Cluster operation and maintenance and business developers will face various challenges. The most common and tedious problem is to locate and troubleshoot various problems encountered during the deployment and operation of business Pods. 1. The necessity of Pod diagnosis Various abnormal states will occur during the life cycle of creation, operation, deletion and destruction of Pods in the cluster.

Posted by Hao Liang's Blog on Tuesday, November 10, 2020