Hao Liang's Blog

Embrace the World with Cloud Native and Open-source

【Scheduling】From kube-scheduler Extender to kube-scheduler Framework

I. Introduction Friends who follow the Kubernetes Scheduler SIG (Special Interest Group) should know that the recently released Kubernetes In version 1.19, Scheduler Framework replaces the original Schduler working mode and officially provides the scheduler to users in a plug-in form. Compared with the “four-piece set” of the old version of the scheduler: Predicate, Priority, Bind, and Preemption. The new version of the scheduler framework is more flexible and introduces a total of 11 extension points.

Large-scale Kubernetes cluster optimization, Pod diagnosis and cluster automated inspection tool design ideas

Preface With the advancement of application containerization, the scale of clusters continues to expand, and more and more businesses are running in the clusters. Cluster operation and maintenance and business developers will face various challenges. The most common and tedious problem is to locate and troubleshoot various problems encountered during the deployment and operation of business Pods. 1. The necessity of Pod diagnosis Various abnormal states will occur during the life cycle of creation, operation, deletion and destruction of Pods in the cluster.

【Scheduling】Priority and preemption mechanism, affinity scheduling, in-tree scheduling algorithm (new features in version 1.19)

1. Priority and preemption mechanism During the scheduling process, Kube-scheduler takes out the Pod from the scheduling queue (SchedulingQueue) each time and performs one round of scheduling. So in what order are the Pods in the scheduling queue added to the queue? The Pod resource object supports setting the Priority attribute. Through different priorities, Pods with high priority are placed in front of the scheduling queue and scheduled first. If the scheduling of a Pod with a high priority fails and no suitable node is found, it will be placed in the UnschedulableQueue and enter the preemption phase.

【Scheduling】kube-scheduler architecture design and startup process code breakdown

1. kube-scheduler architecture design The core function of the scheduler is to find the most suitable node for the Pod to run on. For small-scale clusters, each scheduling cycle will traverse all nodes in the cluster to find the most suitable node for scheduling. For large-scale clusters, each scheduling cycle will only traverse some nodes in the cluster, and find the most suitable nodes among these nodes for scheduling. The entire scheduling process is mainly divided into three nodes: pre-selection, optimization and binding.

Client-go code breakdown (4): Work Queue

1. Introduction to WorkQueue In Informer, the Delta FIFO queue triggers Add, Update, and Delete callbacks. In the callback method, the key of the resource object change event that needs to be processed is put into the WorkQueue work queue. Wait for the Control Loop to be retrieved from the work queue, and then retrieve the complete resource object from the Indexer local cache through Lister for processing. Image source: Geek Time – “Kubernetes in a Simple and In-depth manner” The main function of WorkQueue is marking and deduplication, and supports the following features:

Client-go code breakdown (3): Informer mechanism

1. Introduction In Kubernetes, the controller needs to monitor the status of resource objects in the cluster to coordinate the actual status of the resource objects with the desired status defined through yaml. So how does the controller monitor the resource object and make corresponding processing based on the actual status changes of the object? In fact, it is implemented through the Informer mechanism in the Client-go package. Image source: Geek Time – “Kubernetes in a Simple and In-depth manner” From the picture above, we can roughly understand the entire process of the Informer mechanism:

Client-go code breakdown (2): Resync mechanism analysis in Informer

1. Informer workflow diagram in Client-go The Reflector in Informer obtains the change events (events) of all resource objects in the cluster from the apiserver through List/watch, puts them into the Delta FIFO queue (saved in the form of Key and Value), and triggers onAdd, onUpdate, and onDelete callbacks. Put the Key into the WorkQueue. At the same time, the Key is updated in the Indexer local cache. Control Loop obtains the Key from the WorkQueue, obtains the Value of the Key from the Indexer, and performs corresponding processing.

Client-go code breakdown (1): Client Object

1. Source code structure 2. Client object RESTClient restClient encapsulates RESTful-style HTTP requests and is used to interact with apiserver for HTTP request data. The process of obtaining kubernetes resource objects through restClient is: read kubeconfig configuration information –》Encapsulate HTTP request&ndas

Implementation of zero-disruption rolling updates in Kubernetes

In a Kubernetes cluster, businesses usually use Deployment + LoadBalancer type Service to provide external services. The typical deployment architecture is shown in Figure 1. This architecture is very simple to deploy and operate, but service interruptions may occur when applications are updated or upgraded, causing online problems. Today we will analyze in detail why this architecture will cause service interruption when updating applications and how to avoid service interruption.

【Scheduling】Working principle of kube-scheduler: preemption mechanism in Priority algorithm

1. Why is the preemption mechanism needed? When a pod fails to be scheduled, it is temporarily in the pending state. The scheduler will not reschedule the pod until the pod is updated or the cluster status changes. However, in actual business scenarios, there will be a distinction between online and offline services. If the pod of the online service fails to be scheduled due to insufficient resources, it is necessary for the offline service to drop part of the resources to provide resources for the online service.