【Scheduling】Pod State Scheduling Scheduler plugin for scoring based on node pod status

Posted by Hao Liang's Blog on Sunday, October 24, 2021

This is the first scheduler plug-in I contributed to the scheduler-plugin open source project of the Kubernetes sig-scheduling group in 2020.

1. Background

Related PR: PR103: Pod State Scheduling Plugin

Source code address: Pod State Scheduling

  • The current Kubernetes native scheduler scoring algorithm (Score) does not consider the existing Terminating status Pods on the node.
  • The current Kubernetes native scheduler scoring algorithm (Score) does not consider the existing Nominated status Pods on the node.

2. Function introduction

The Pod State scheduling scheduler plug-in implements a scoring extension plug-in (Score Plugin)

  • The more Pods in the Terminating state on a node, the higher the score they will get in the scoring phase (the Pod is about to be unbound from the node and release resources, and it is expected to get a higher score)
  • The more Nominted Pods on a node, the lower the score they will receive in the scoring phase (the Pod will be bound to the node and occupy resources, and it is expected to receive a lower score)

3. Implementation principle

The implementation principle of the scoring algorithm:

  • Count the number of Pods in Terminating and Nominated status on each node respectively
  • Add 1 to the Pod score for each Terminating state
  • Decrease Pod score by 1 for each Nominated status

func (ps *PodState) score(nodeInfo *framework.NodeInfo) (int64, *framework.Status) {
	var terminatingPodNum, nominatedPodNum int64
	// get nominated Pods for node from nominatedPodMap
	nominatedPodNum = int64(len(ps.handle.PreemptHandle().NominatedPodsForNode(nodeInfo.Node().Name)))
	for _, p := range nodeInfo.Pods {
		// Pod is terminating if DeletionTimestamp has been set
		if p.Pod.DeletionTimestamp != nil {
			terminatingPodNum++
		}
	}
	return terminatingPodNum - nominatedPodNum, nil
}

Regularized score range:

  • Count the scores of the highest and lowest scoring nodes respectively
  • Determine the source score range (oldRange) by subtracting the lowest score from the highest score
  • Use the highest score configured by the scoring plug-in minus the lowest score to determine the target score range (newRange)
  • Calculation formula: ((real score of each node - lowest score of all nodes) * newRange / oldRange) + lowest score configured by the scoring plug-in

func (ps *PodState) NormalizeScore(ctx context.Context, state *framework.CycleState, pod *v1.Pod, scores framework.NodeScoreList) *framework.Status {
	// Find highest and lowest scores.
	var highest int64 = -math.MaxInt64
	var lowest int64 = math.MaxInt64
	for _, nodeScore := range scores {
		if nodeScore.Score > highest {
			highest = nodeScore.Score
		}
		if nodeScore.Score < lowest {
			lowest = nodeScore.Score
		}
	}

	// Transform the highest to lowest score range to fit the framework's min to max node score range.
	oldRange := highest - lowest
	newRange := framework.MaxNodeScore - framework.MinNodeScore
	for i, nodeScore := range scores {
		if oldRange == 0 {
			scores[i].Score = framework.MinNodeScore
		} else {
			scores[i].Score = ((nodeScore.Score - lowest) * newRange / oldRange) + framework.MinNodeScore
		}
	}

	return nil
}

4. Usage

Scheduler plug-in configuration example:

apiVersion: kubescheduler.config.k8s.io/v1beta1
kind: KubeSchedulerConfiguration
leaderElection:
  leaderElect: false
clientConnection:
  kubeconfig: "REPLACE_ME_WITH_KUBE_CONFIG_PATH"
profiles:
- schedulerName: default-scheduler
  plugins:
    score:
      enabled:
      - name: PodState