Hao Liang's Blog

Embrace the World with Cloud Native and Open-source

The Road to Kubestronaut: Guide

1. What is Kubestronaut I believe some people in the CNCF community already notice the Kubestronaut Program has been released recently. Brief introduction: The Kubestronaut program recognises community leaders who have consistently invested in their ongoing education and grown their skill level with Kubernetes. Individuals who have successfully passed every CNCF’s Kubernetes certifications – CKA, CKAD, CKS, KCNA, KCSA – will receive the title of “Kubestronaut” 2. Why do I want to become a Kubestronaut To be honest, I’ve joined the CNCF community for over 5 years.

【Envoy-04】Envoy xDS Dynamic Configuration and Control Plane Interactions

1. Interacting with Control Plane What’s a control plane To manage all these configuration files in a central place, we need to introduce a control plane. Control plane propagates all the network configuration to the data plane. Why is it useful Control plane subscribes for configuration updates, whenever cluster changed, routes added, listeners added, the control plane will send those updates to envoy and it apply the new configuration dynamically without restarting.

Renaming Node Name without Resetting kubelet Environment

Goal Rename any node name in Kubernetes cluster. No need to reset the whole kubelet environment like most of the approaches. No need to drain any Pods running on the Node. Bootstrap Process of kubelet Doc refer to: https://kubernetes.io/docs/reference/access-authn-authz/kubelet-tls-bootstrapping/ Chinese version introduction refer to: https://cloud.tencent.com/developer/article/1656007 The kubelet process starts. Try to find kubeconfig file specified by arg --kubeconfig=xxx, if not found, try to find bootstrap-kubeconfig file specified by arg --bootstrap-kubeconfig=xxx instead.

【Envoy-03】Securing Envoy Proxy

1. Envoy Threat model refer to: threat_model The Threat Model is: Identifying and enumerating threats and vulnerabilities Devising mitigations Prioritising residual risks Escalating the most important risks Why Treat Model? Identify security flaws early Save money and time consuming redesigns Focus your security requirements Identify complex risks and data flows for critical assets 2. Configuration Best Practices refer to: best_practices/edge An example to run envoy with secure config: # https://github.com/solo-io/hoot/blob/master/03-security/edge.yaml admin: # access log to admin interface access_log_path: "/tmp/envoy_admin.

Something you might need to know when developing a CNI plugin

Introduction CNI, as in Container Networking Interface for kubernetes, dedicated to provide network solution for Kubernetes containers. There are tons of CNI plugin for kubernetes networking on the market, some of them are opensource projects.(e.g. flannel, calico, cilium) Besides, the CNI officially provides some sample cni demo for end users. How does kubelet interact with CNI Implemented by Dockershim In preview version of Kubernetes(less or equal 1.23), if the container runtime is specified to docker, CNI plugin will be called in dockershim#cni.

What's inside Nvidia Container Toolkit?

Architecture Overview The NVIDIA container stack is architected so that it can be targeted to support any container runtime in the ecosystem. The components of the stack include: The NVIDIA Container Runtime (nvidia-container-runtime) The NVIDIA Container Runtime Hook (nvidia-container-toolkit / nvidia-container-runtime-hook) The NVIDIA Container Library and CLI (libnvidia-container1, nvidia-container-cli) The components of the NVIDIA container stack are packaged as the NVIDIA Container Toolkit. How these components are used depends on the container runtime being used.

【Envoy-02】Monitoring, Performance, and Troubleshooting

1. Envoy Observability Concept: Mechanisms to observe Envoy’s state Debugging and monitoring Envoy Overview: Admin interface stats config dump clusters log level Debug logs Access logs Metrics Collection Tracing 2. Admin Interface /stats : histogram metrics, current status of Envoy(e.g. how many requests, how many succeeded, how many failed) /config_dump: dump current internal Envoy configuration /clusters: actual membership of cluster /logging: Envoy logs # https://github.com/solo-io/hoot/blob/master/02-observe/stats.yaml admin: access_log_path: /dev/stdout address: socket_address: { address: 127.

【Envoy-01】Architecture Overview and Fundamentals

I’ve been re-learning Envoy recently since it’s a powerful L4/L7 proxy widely used in multiple opensource projects(e.g. Istio, Cilium, Envoy Gateway). Back in 2019, I first got to know Service Mesh, the first Service Mesh opensource project I got involve is Istio, already using Envoy as L4/L7 proxy for traffic management. At that time, I wasn’t interested in Envoy presumably because it’s written in C++, which is considered as a ‘The deeper you get, the harder it gets’ language.

【Troubleshooting】Reusable CPUs from initContainer were not being honored

1. Description In early version of kubernetes v1.18. Related Commit: Fix a bug whereby reusable CPUs and devices were not being honored #93289 Related PR: Fix a bug whereby reusable CPUs and devices were not being honored #93189 Refactor the algorithm used to decide CPU assignments in the CPUManager #102014 Previously, it was possible for reusable CPUs and reusable devices (i.e. those previously consumed by init containers) to not be reused by subsequent init containers or app containers if the TopologyManager was enabled.

【Troubleshooting】Summary of kube-apiserver troubleshooting techniques (analysis of logs and caching principles)

1. Related background When troubleshooting apiserver issues, we found nodes that may have performance bottlenecks through monitoring. The next step is to further analyze the apiserver logs on the nodes. 2. APIServer log analysis skills Trace log Log printing conditions When the total request time exceeds the threshold (default 500ms), apiserver will print the trace log, and at each step of the trace, it will calculate a step time-consuming threshold