Large-scale Kubernetes cluster optimization, Pod diagnosis and cluster automated inspection tool design ideas

Posted by Hao Liang's Blog on Tuesday, November 10, 2020

Preface

With the advancement of application containerization, the scale of clusters continues to expand, and more and more businesses are running in the clusters. Cluster operation and maintenance and business developers will face various challenges. The most common and tedious problem is to locate and troubleshoot various problems encountered during the deployment and operation of business Pods.

1. The necessity of Pod diagnosis

Various abnormal states will occur during the life cycle of creation, operation, deletion and destruction of Pods in the cluster. As cluster operators and developers, we often need to use various means to investigate the causes of abnormal Pod states. For example: when a Pod is in the Pending state, we often need to describe the Pod to see at which stage the Pod is blocked when it is created. When containers in a Pod are restarted frequently, we often need to check the status code and log information of the container’s exit to locate the cause of the container restart. On the one hand, these troubleshooting methods require operation and maintenance personnel to repeatedly execute a series of commands in the shell terminal. On the other hand, they are not friendly enough to developers who use the cluster PaaS platform to deploy applications. Can we judge what kind of abnormal state the Pod is in based on the various characteristics (Phase, ExitCode, State) when the Pod is in an abnormal state, and give corresponding solution suggestions? The answer is yes. We only need to summarize the abnormal causes corresponding to various performance characteristics when the Pod is abnormal, and then give suggested solutions to diagnose the abnormal status of the Pod.

2. Factors to consider in Pod diagnosis

So how to determine the cause of the Pod’s exception based on the abnormal status characteristics of the Pod? I have summarized the following common scenarios:

Phase - Status code of the last exit of the container: 137 Status code of the last exit of the container: 0 Status code of the last exit of the container: 1 Current container NotReady
Pending Scheduling failed, image pulling failed —— —— —— ——
Running Lost contact with the node and was evicted The container has been actively restarted, and the process exited normally (business code exception) The container has been actively restarted, and the process has exited abnormally (business code exception) The container has been passively restarted, and the process may have been Liveness probe or OOM kill ——
Succeeded The container process exited normally —— —— —— ——
Failed Evicted —— —— —— Container exited abnormally

The cause of the Pod’s abnormal status can be analyzed from the current pod.Status.Phase, pod.Status.ContainerStatused and other fields. Among them, pod.Status.Phase can get the status information of the current Pod, and pod.Status.ContainerStatused can get the container status of the current Pod, the last container exit status code and other information. With the above information, the cause of the abnormal status of each container of the Pod can be located.

3. Cluster automated inspection tools

We can use the same idea to design a cluster health inspection tool. First, we need to understand how to measure whether a cluster is healthy? We know that Kubernetes clusters run on virtual machines or physical machines. We must not only ensure that the components inside the cluster are working properly, but also ensure that the relevant services on the node where the cluster is located can run normally, and ensure the configuration of the node network and domain name resolution. is correct. Here is a summary of the main inspection items of cluster health status:

  • Whether the Kubernetes components are running normally and configured with high availability: Obtain the running status information of each component (api-server, controller manager, scheduler) from the api-server through the clientset client.
  • Whether the node resource usage reaches the threshold: Set a threshold for node resources (CPU, memory, disk), detect the actual resource usage of each node through the Prometheus monitoring system, and prompt the user if the threshold is reached.
  • Whether the basic services in the node are normal: Detect the monitoring status of the node’s basic services (etcd, kubelet, dnsmasq, docker, sdn) through the Prometheus monitoring system.
  • Node configuration file check: Obtain the node’s main configuration file information (/etc/hosts, /etc/resolv.conf, network card configuration) through the agent pod mounting host directory for inspection.
  • Related checks on the cluster’s main entrance LB node: Mainly check the number of file handles, resource usage, nf_conntrack table, whether it is highly available, etc.

The above gives the design ideas of Pod diagnostic tools and cluster inspection tools. Currently, these two tools have been developed and put into use. The core idea of its design is to quantify problems, summarize the causes of recurring problems, provide solutions in an automated way, reduce duplication of work by operation and maintenance personnel and facilitate their location of problems.