Hao Liang's Blog

Embrace the World with Cloud Native and Open-source

What's inside Nvidia Container Toolkit?

Architecture Overview The NVIDIA container stack is architected so that it can be targeted to support any container runtime in the ecosystem. The components of the stack include: The NVIDIA Container Runtime (nvidia-container-runtime) The NVIDIA Container Runtime Hook (nvidia-container-toolkit / nvidia-container-runtime-hook) The NVIDIA Container Library and CLI (libnvidia-container1, nvidia-container-cli) The components of the NVIDIA container stack are packaged as the NVIDIA Container Toolkit. How these components are used depends on the container runtime being used.

【Envoy-02】Monitoring, Performance, and Troubleshooting

1. Envoy Observability Concept: Mechanisms to observe Envoy’s state Debugging and monitoring Envoy Overview: Admin interface stats config dump clusters log level Debug logs Access logs Metrics Collection Tracing 2. Admin Interface /stats : histogram metrics, current status of Envoy(e.g. how many requests, how many succeeded, how many failed) /config_dump: dump current internal Envoy configuration /clusters: actual membership of cluster /logging: Envoy logs # https://github.com/solo-io/hoot/blob/master/02-observe/stats.yaml admin: access_log_path: /dev/stdout address: socket_address: { address: 127.

【Envoy-01】Architecture Overview and Fundamentals

I’ve been re-learning Envoy recently since it’s a powerful L4/L7 proxy widely used in multiple opensource projects(e.g. Istio, Cilium, Envoy Gateway). Back in 2019, I first got to know Service Mesh, the first Service Mesh opensource project I got involve is Istio, already using Envoy as L4/L7 proxy for traffic management. At that time, I wasn’t interested in Envoy presumably because it’s written in C++, which is considered as a ‘The deeper you get, the harder it gets’ language.

【Troubleshooting】Reusable CPUs from initContainer were not being honored

1. Description In early version of kubernetes v1.18. Related Commit: Fix a bug whereby reusable CPUs and devices were not being honored #93289 Related PR: Fix a bug whereby reusable CPUs and devices were not being honored #93189 Refactor the algorithm used to decide CPU assignments in the CPUManager #102014 Previously, it was possible for reusable CPUs and reusable devices (i.e. those previously consumed by init containers) to not be reused by subsequent init containers or app containers if the TopologyManager was enabled.

【Troubleshooting】Summary of kube-apiserver troubleshooting techniques (analysis of logs and caching principles)

1. Related background When troubleshooting apiserver issues, we found nodes that may have performance bottlenecks through monitoring. The next step is to further analyze the apiserver logs on the nodes. 2. APIServer log analysis skills Trace log Log printing conditions When the total request time exceeds the threshold (default 500ms), apiserver will print the trace log, and at each step of the trace, it will calculate a step time-consuming threshold

【Troubleshooting】Summary of kube-apiserver troubleshooting techniques (monitoring analysis)

1. Related background As the scale of a single K8s cluster continues to expand (the number of nodes reaches 4,000+), we found during the operation that the apiserver has gradually become the performance bottleneck of the cluster, prone to problems such as unresponsive requests, slow responses, and request rejections, and even causes cluster avalanches, causing Network failure found. The following details how to quickly locate and troubleshoot apiserver performance issues.

Differences in container runtime between different versions of docker

Reference articles: K8s will eventually abandon docker, and TKE already supports containerd Using docker as image building service in containerd cluster 1. Background When comparing Kubernetes clusters using different docker versions (1.18, 1.19) as container runs, we found some differences in the underlying implementation. I will make a record here. 2. Issue analysis docker 1.18 Container process tree: containerd is not a system service, but a process started by dockerd

【ETCD】Analysis of the underlying mechanism of ETCD Defrag

1. Related source code server/storage/backend/backend.go#defrag() server/storage/backend/backend.go#defragdb() 2. Why do we need defrag When we use K8s clusters daily, if we frequently add or delete cluster data, we will find a strange phenomenon: Even though the amount of object data in the cluster has not increased significantly, the disk space occupied by etcd data files is increasing. At this time, check the relevant information. etcd officially recommends using the defrag command of the provided etcdctl tool to defragment the data of each etcd node:

【Troubleshooting】A large number of pending high-priority Pods in the cluster affect the scheduling of low-priority Pods

1. Background Related issues: low priority pods stuck in pending without any scheduling events #106546 Totally avoid Pod starvation (HOL blocking) or clarify the user expectation on the wiki #86373 Related optimization proposal: Efficient requeueing of Unschedulable Pods 2. Issue analysis There are a large number of high-priority Pods in the Pending state in the cluster because the current cluster resources do not meet the resource requests of these high-priority

【Operating System】Go Runtime's MADV_FREE memory release issue

1. Background Related issues: runtime: memory not being returned to OS #22439 runtime: provide way to disable MADV_FREE When using applications compiled with go 1.12~1.15, it often happens that after the application is started, the resident memory RSS continues to increase as the running time increases, and the memory is never released. 2. Issue Analysis Use pprof to analyze various memory usage in Go Runtime. The following is the meaning of various memories in pprof: