K8S部署在Debian上的常见故障及排除步骤
sudo apt update && sudo apt upgrade -y更新系统,避免软件包冲突;sudo modprobe overlay && sudo modprobe br_netfilter),并通过sysctl设置net.bridge.bridge-nf-call-iptables=1、net.ipv4.ip_forward=1(需写入/etc/sysctl.d/99-kubernetes.conf并执行sysctl -p生效);sudo swapoff -a临时关闭,修改/etc/fstab永久禁用(注释含swap的行)。kubelet服务无法启动、状态显示inactive (dead)或日志报错(如failed to start container runtime)。systemctl status kubelet(查看是否运行及错误信息);journalctl -u kubelet -f(跟踪实时日志,定位具体错误,如镜像拉取失败、证书问题);systemctl restart kubelet(修复临时故障)。kubeadm join时报错(如token expired、无法获取ConfigMap、connection refused)。kubeadm token create --print-join-command,获取新的join命令;kubectl get nodes显示Ready,且kube-apiserver服务正常(systemctl status kube-apiserver);ping通Master节点的IP(尤其是6443端口,用于API Server通信)。image字段(如nginx:latest是否正确);若使用私有仓库,在Worker节点的/etc/docker/certs.d/<registry-domain>/目录下添加证书,并重启Docker(systemctl restart docker)。kubectl logs <pod-name> -n <namespace>),检查应用错误;调整资源请求/限制(resources.requests和resources.limits)。ClusterIP无法内网访问、NodePort无法外网访问)。kubectl get pods -n kube-system(检查Flannel、Calico等插件Pod是否Running);kubectl delete -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml),再重新应用;br_netfilter已加载(lsmod | grep br_netfilter);sysctl net.bridge.bridge-nf-call-iptables(需为1)。kubectl get svc显示服务端口正常,但无法通过NodeIP:NodePort或ClusterIP访问。ClusterIP仅集群内部可访问,需改为NodePort(type: NodePort)或LoadBalancer(需云厂商支持);ports字段(如targetPort需与应用容器端口一致,port为Service端口,nodePort为可选的外部访问端口(30000-32767));sudo ufw allow <node-port>)。kubectl命令报错(如x509: certificate signed by unknown authority)、无法访问API Server。kubeadm certs renew all,然后重启相关组件(systemctl restart kubelet);kubectl --insecure-skip-tls-verify=true命令,或修改~/.kube/config中的insecure-skip-tls-verify为true(不推荐生产环境)。Pending(无法调度)、系统频繁触发OOM Killer(杀死进程释放内存)。kubectl describe node <node-name>(查看节点资源分配情况);kubectl top pod(查看Pod资源占用);/var/log/下的旧日志),或扩展磁盘容量;resources.requests(如memory: "512Mi"、cpu: "500m"),避免过度申请资源。kubelet日志报错(如sysctl: cannot stat /proc/sys/net/bridge/bridge-nf-call-iptables)、网络插件无法正常工作。sudo modprobe br_netfilter && sudo modprobe overlay;/etc/modules-load.d/kubernetes.conf文件中(每行一个模块),然后运行sudo systemctl restart systemd-modules-load。kubeadm init或kubeadm join时报错(如unsupported Docker version、kubelet版本与kube-apiserver不兼容)。sudo apt remove docker-ce),安装符合要求的版本。