Bachelor Degree South China University of technology electronic engineering
Go programming, Multi-year Kubernetes Cloud platform development experience(e.g. CRD Operator, Scheduler Plugin, Device Plugin, CNI).
Linux shell script, Large scale Kubernetes cluster management. Familiar with Kubernetes and container, troubleshooting and debugging. Source code breakdown
Familiar with Prometheus, Grafana Observability toolkit integration and usage.
Blog: lianghao208.top GitHub: github.com/lianghao208 CNCF KubeCon Presentations:
- 2021: Promoting Kubernetes to the Edge with Superedge
- 2023: Deep Dive Kwok Kubernetes Community Member and Contributors Publications: Istio Service Mesh Advanced Practical
Senior software engineer, with skills in managing and developing large-scale k8s platform. k8s community member and contribute to multiple CNCF open source projects.
-
Development: Proficient in Go programming language, with multi-year experience in k8s Cloud platform development(e.g. CRD Operator, Scheduler Plugin, Device Plugin, CNI).
-
SRE: Familiar with Linux, with experience in managing large-scale k8s clusters. Capable of troubleshooting complex issues of k8s. Experience in improving stability and sustainability for k8s clusters.
-
Technical Influence: Blog: http://lianghao208.top Github: https://github.com/lianghao208 Opensource contributions:
- Kubernetes Community Member/Contributor
- SuperEdge Maintainer(An CNCF sandbox edge computing opensource project)
- Kstone Contributor(An etcd management opensource project) CNCF KubeCon Presentations:
- 2021: Promoting Kubernetes to the Edge with Superedge
- 2023: Deep Dive Kwok Publications: Istio Service Mesh Advanced Practical
Responsible for the development of the company’s GPU Kubernetes platform, supporting containerized deployment of large-scale training and inference services within the company, and improving cluster stability and utilization.
Responsible for the company’s TKE (Tencent Kubernetes Engine) cluster operation, maintenance and development on the online/offline container platform.
Responsible for Kubernetes’ operation platform construction (monitoring, cluster disaster recovery, cluster auto-scaling, CI/CD).
Responsible for supporting large number of self-developed businesses (Tencent Meeting, QQ Photo Album, Tencent Advertising, AI for Honor Of Kings, etc.) on the cloud.
Developing machine learning cloud platform for pre-trained and inference tasks based on LLM within the company. Achieve millions of core containers in the company to be transfered to the cloud.
Responsible for the company’s OpenShift (Kubernetes) cluster operation, maintenance and development, establishment of monitoring systems, DevOps platform CI/CD tool development, etc.
Machine Learning Kubernetes Platform: The company’s cloud-native GPU training and inference platform is oriented to GPU business scenarios such as Hunyuan large language models, recommended advertising, and AI for Honor Of Kings. Tech involves: device plugin development, gang scheduling development, cross-machine high-performance RDMA network support, gpu virtualization, DCGM gpu monitoring system.
CPU Online and Offline Kubernetes Platform
With over one million CPU cores, over 200k nodes, and over 70 clusters in total, we build a CPU Online and Offline Kubernetes Platform. Supporting the migration of massive online and offline services to the cloud, including Tencent video transcoding, information security, Tencent meeting, etc. Complete the company’s business cloud goals and improve the company’s overall node resource utilization.
Large-scale etcd manage platform: Tencent’s open source etcd cluster management platform kstone supports etcd data visualization, automatic data backup and recovery, monitoring visualization, automated inspection, cluster risk assessment and other functions to address etcd cluster operation and maintenance pain points.
Participated in the development of the kstone community version of Tencent’s open source etcd cluster management platform, mainly responsible for platform security certification, cluster inspection plug-in expansion, architecture optimization, etc. The company has also completed the implementation of the kstone platform within the company. Currently, the platform manages more than 70 etcd clusters and integrates functions such as automatic backup of etcd clusters and automatic alarms for abnormal inspections to improve the efficiency of platform operation and maintenance.
cover letter:
I am proficient in Go programming language with multi-year experience in Kubernetes development. I am a open source enthusiast and have contributed for multiple Kubernetes projects and became a Kubernetes community member. I have experience in managing large-scale kubernetes cluster, not only nodes with CPU, but also with GPU.