Resource oversold based on MutatingAdmissionWebhook in Kubernetes

Posted by Hao Liang's Blog on Sunday, March 22, 2020

一、Analysis of Resource Oversold Problem

In the production environment, many Pods are running on the computing nodes of the kubernetes cluster, running various business containers. We usually use resource objects such as Deployment, DeamonSet, and StatefulSet to control the addition, deletion, and modification of Pods. Therefore, development or operation and maintenance often need to configure the resource quotas of the CPU and memory of the business container in the Containers field of these resource objects: requests and limits.

  • requests: the resources required by the node to schedule the pod. Each time the node is successfully scheduled, the Allocatable attribute value (allocatable resources) of the node will be recalculated. new Allocatable value = old Allocatable value - set requests value
  • limit: the maximum resource that can be obtained by running a pod in the node, when the cpu

It is not difficult to find that when the requests field is set too large, the resources actually used by the pod are very small, causing the Allocatable value of the computing node to be consumed quickly, and the resource utilization of the node will become very low. Node resource occupancy The largest blue box (allocatable) in the above figure is the allocatable resources of the computing node, the orange box (requests) is the requests attribute configured by the user, and the red box (current) is the resources actually used by the business container. So the node’s resource utilization is current/allocatable. However, because the requests setting is too large and fills up the allocatable, new pods cannot be scheduled to this node. There will be a situation where the actual resource usage of the node is very low, but the pod cannot be scheduled to the node because the allocatable is too low. Phenomenon. Therefore, can we dynamically adjust the allocatable value to make the allocatable resources of the computing node “falsely high” and fool the k8s scheduler into thinking that the node has a large allocatable resource and allow as many pods to be scheduled as possible? What about this node? Dynamically adjust allocatable attributes In the figure above, by expanding the allocatable value (fake allocatable), more pods are scheduled to the changed node, and the node’s resource utilization current / allocatable becomes larger.

apiVersion: v1
kind: Node
metadata:
  annotations:
    ...
  creationTimestamp: ...
  labels:
    ...
  name: k8s-master.com
  resourceVersion: "7564956"
  selfLink: /api/v1/nodes/k8s-master.com
  uid: ...
spec:
  podCIDR: 10.244.0.0/24
status:
  addresses:
  - address: 172.16.0.2
    type: InternalIP
  - address: k8s-master.com
    type: Hostname
  allocatable: ## This is the field that needs to be modified dynamically! ! !
    cpu: "1"
    ephemeral-storage: "47438335103"
    hugepages-2Mi: "0"
    memory: 3778260Ki
    pods: "110"
  capacity:
    cpu: "1"
    ephemeral-storage: 51473888Ki
    hugepages-2Mi: "0"
    memory: 3880660Ki
    pods: "110"
  conditions:
    ...
  daemonEndpoints:
    ...
  images: 
    ...

2. Ideas for realizing oversold resources

The key to realizing resource oversold is to dynamically modify the allocatable field value of the node object. We see that the allocatable field belongs to the Status field, and obviously cannot be modified directly through the kubectl edit command. Because the Status field is different from the Spec field, Spec is the expected data set by the user, while Status is the actual data (the Node node updates its real-time status by continuously sending heartbeats to the apiServer, which is ultimately stored in etcd). So how do we modify the Stauts field? First, to modify the Status value of any resource object in k8s, k8s officially provides a set of RESTful API: [https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.13](https:// kubernetes.io/docs/reference/generated/kubernetes-api/v1.13) The RESTful API of k8s can be called through the patch or put method to modify the Stauts field. (Here, ApiServer is used to modify the value of the Status field saved in etcd) Insert image description here However, the Node resource object is special. The computing node will continuously send heartbeats to the ApiServer (once every 10 seconds by default), send real information with the Status field to the ApiServer, and update it to etcd. That is to say, no matter how you modify the Status field of Node through the patch/put method, the computing node will regularly overwrite the data you modified with the real Status data by sending heartbeats. This means that we cannot modify the Node object by directly calling the RESTful API. Status data. Insert image description here So can we directly monitor the heartbeat data of this computing node and modify the allocatable value in the Status field in the heartbeat data to achieve resource oversold?

3. MutatingAdmissionWebhook characteristics

The answer is yes, k8s provides the Admission Controller mechanism in ApiServer, including MutatingAdmissionWebhook. Through this webhook, all requests for interaction with ApiSever in the cluster are sent to a specified interface. , as long as we provide such an interface, we can obtain the Staus data of the heartbeat sent by Node to ApiServer. Then we make our own custom modifications to this data, and then pass it to etcd, which will make etcd think that our modified Status data is the real Status of the node, and ultimately achieve oversold resources. Insert image description here

MutatingAdmissionWebhook, as part of the Admission Controller in Kubernetes’ ApiServer, provides a very flexible extension mechanism. By configuring the MutatingWebhookConfiguration object, it can theoretically monitor and modify any request processed by the ApiServer.

4. Introduction to MutatingWebhookConfiguration object

apiVersion: admissionregistration.k8s.io/v1beta1
kind: MutatingWebhookConfiguration
metadata:
  creationTimestamp: null
  name: mutating-webhook-oversale
webhooks:
- clientConfig:
    caBundle: ...
    service:
      name: webhook-oversale-service
      namespace: oversale
      path: /mutate
  failurePolicy: Ignore
  name: oversale
  rules:
  - apiGroups:
    - *
    apiVersions:
    - v1
    operations:
    - UPDATE
    resources:
    - nodes/status

MutatingWebhookConfiguration is an object provided by an official resource of kubernetes. Here is a brief explanation of the fields of this object:

  • clientConfig.caBundle: the encrypted authentication data required when apiServer accesses our customized webhook service
  • clientConfig.service: apiServer accesses Service-related information of our customized webhook service (including specific interface information)
  • failurePolicy: The strategy adopted when apiServer calls our custom webhook service with an exception (Ignore: ignore the exception and continue processing, Fail: directly fail and exit without continuing processing)
  • rules.operations: The operation type of monitoring apiServer. In the example, only apiServer calls that match the UPDATE type will be handed over to our custom webhook service for processing.
  • rules.resources: Monitor the resource and sub-resource types of apiServer. In the example, only apiServer calls that match the resource type of the status field of nodes will be handed over to our custom webhook service for processing.

Combining the properties of rules.operations and rules.resources, we can know that the MutatingWebhookConfiguration in the sample monitors the update operation submitted to the apiServer by the status data of the nodes resources in the cluster (this is the *heartbeat information we mentioned earlier *), and sends all the heartbeat information to the /mutate interface under the Service named webhook-oversale-service for processing. This interface is provided by our custom webhook service. Insert image description here The container in which the Pod in the picture above is running is our custom webhook service. A custom webhook service sample is for reference: admission-webhook-oversale-sample

5. Source code analysis

To be continued

6. Resource oversold algorithm practice

To be continued

7. Reference materials

kubernetes RESTful API docs Tencent’s self-developed business is moved to the cloud: Discussion on technical solutions for optimizing Kubernetes cluster load api-conventions:spec-and-status