一 环境准备与系统基线
sudo swapoff -a,并在**/etc/fstab**中注释或移除 swap 条目,避免重启后恢复。二 安装容器运行时与Kubernetes组件
sudo apt install -y containerdsudo mkdir -p /etc/containerd && containerd config default | sudo tee /etc/containerd/config.tomlsudo systemctl daemon-reload && sudo systemctl enable --now containerdsudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.iosudo systemctl start docker && sudo systemctl enable dockersudo apt update
sudo apt install -y apt-transport-https ca-certificates curl gnupg
curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo gpg --dearmor -o /usr/share/keyrings/kubernetes-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
sudo systemctl enable --now kubelet。三 初始化控制平面与网络插件
sudo kubeadm init \
--apiserver-advertise-address <MASTER_IP> \
--control-plane-endpoint <MASTER_IP>:6443 \
--pod-network-cidr 10.244.0.0/16 \
--service-cidr 10.100.0.0/16 \
--image-repository registry.aliyuncs.com/google_containers
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.ymlkubectl apply -f https://docs.projectcalico.org/manifests/calico.yamlkubectl cluster-info、kubectl get nodes 应显示控制平面节点状态为Ready。四 加入工作节点与日常运维
kubeadm join ... 命令在各 Worker 上执行,例如:sudo kubeadm join <MASTER_IP>:6443 \
--token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>
加入成功后,回到 Master 执行 kubectl get nodes 查看节点状态。kubectl label node <node> disktype=ssdnvidia.com/gpu。五 常见问题与优化建议
--image-repository registry.aliyuncs.com/google_containers 或配置私有镜像仓库/镜像代理。