【Kubelet】Practical analysis of Kubernetes node extension resources and Device Plugin

Posted by Hao Liang's Blog on Sunday, November 7, 2021

1. Background

In kubernetes, the node is abstracted into a resource (resource). Currently, there are five officially defined attributes for the allocable resource size of the node: cpu, memory, ephemeral-storage, hugepages-1Gi, hugepages-2Mi When we create a Pod, the scheduler will determine whether the Pod’s requests (required resources) meet the allocable resources of the current node and determine whether the Pod can run on this node. In many business scenarios, it is impossible to fully describe the resource attributes of a node (such as GPU, network card bandwidth, number of allocable IPs, etc.) simply relying on basic resources such as CPU, memory, and disk. Therefore, Kubernetes provides an extension mechanism for node resource description, and the extended resource for node can be customized according to needs.

2. Node extension resources

Extended Resource For Node Essentially, you add custom resource attributes to the node by calling the kube api:

PATCH /api/v1/nodes/<your-node-name>/status HTTP/1.1
Accept: application/json
Content-Type: application/json-patch+json
Host: k8s-master:8080

[
  {
    "op": "add",
    "path": "/status/capacity/example.com~1dongle",
    "value": "4"
  }
]

The above example patches a resource attribute named example.com/dongle for the node by calling kube-api. When we use kubectl get node <your-node-name> -oyaml to view the node’s resources When checking the capacity, it was found that the node has more resources of the type example.com/dongle

Capacity: 
  cpu:  2 
  memory:  2049008Ki 
  example.com/dongle:  4

This means that when we create a Pod and its requests have resource requirements such as example.com/dongle, the scheduler will consider the node example.com/dongle as an extended resource. When the node has enough example.com/dongle, the Pod can be successfully scheduled.

3. Device Plugin

Device Plugin It is a custom plug-in officially provided by Kubernetes for dynamically detecting node allocable resources (custom extended resources such as GPU, network card bandwidth, number of allocable IPs, etc.). This plug-in is generally deployed on each node in the cluster through Daemonset. The custom plug-in starts a gRPC service and implements the standard Device Plugin interface. The plug-in can obtain and update the allocable extended resources of the node in real time. Kubelet obtains the information on the node’s allocable resources returned by the Device Plugin in real time through the List/Watch mechanism. The following figure is a schematic diagram of the interaction between Kubelet and Device Plugin: image source

code example:AMD GPU device plugin

4. Application scenarios

The two methods mentioned above can both customize and expand node resources. Creating and updating node expansion resources directly through kube-api is relatively simple to implement, and is suitable for static resource attributes of nodes (such as network card bandwidth, disk capacity). The capacity information of this resource only needs to be obtained when the component is started for the first time, and the resource capacity will not change during the subsequent operation of the node. The Device Plugin mainly solves the problem of dynamic changes in the node’s extended resources. The Device Plugin mechanism is required to monitor the node’s allocable extended resources and allow Kubelet to perceive resource changes in real time (such as GPU, NUMA CPU topological relationships)

This article mainly introduces the custom resource expansion mechanism of Kubernetes, paving the way for subsequent analysis of the more complex expanded scheduler plug-in (scheduler-plugin).