CentOS 上 RabbitMQ 故障排查步骤
一 快速判定与定位
systemctl status rabbitmq-serverjournalctl -xe | tail -n 200tail -f /var/log/rabbitmq/rabbit@<hostname>.logwget -O- http://localhost:15672(应返回登录页或 200)telnet localhost 5672 或 nc -vz localhost 5672http://<服务器IP>:15672ss -lntp | egrep '5672|15672|4369'ps -ef | grep rabbitmq;必要时 kill -9 <PID> 后重启rabbitmq-plugins list | grep managementrabbitmqctl status、rabbitmqctl cluster_status二 常见故障与修复
journalctl -xe 与 /var/log/rabbitmq/*.log 的错误关键词(配置语法、权限、依赖、资源告警等)Job for rabbitmq-server.service failed 且日志提示无法解析节点名或 epmd 连接失败/etc/hosts 为当前 hostname 添加解析(示例:192.168.1.10 <hostname>),确保本机可解析自身主机名后重启服务openssl-devel、socat)后重装/重启firewall-cmd --zone=public --add-port=15672/tcp --permanent && firewall-cmd --reloadrabbitmq-plugins enable rabbitmq_management/etc/hosts 补全所有节点 IP–主机名 映射;必要时检查 .erlang.cookie 一致性与权限(见下文)rabbitmqctl add_user admin <pwd> && rabbitmqctl set_user_tags admin administrator && rabbitmqctl set_permissions -p / admin ".*" ".*" ".*"三 高频命令清单
systemctl start|stop|restart|status rabbitmq-serverrabbitmq-server -detached(后台启动)rabbitmqctl status、rabbitmqctl stop_app、rabbitmqctl start_apprabbitmq-plugins enable|disable rabbitmq_managementrabbitmqctl add_user <u> <p>、rabbitmqctl set_user_tags <u> administratorrabbitmqctl set_permissions -p / <u> ".*" ".*" ".*"rabbitmqctl cluster_status、rabbitmqctl join_cluster <node>rabbitmqctl status | egrep 'memory|disk'rabbitmqctl set_vm_memory_high_watermark 0.4、rabbitmqctl set_disk_free_limit 500MB四 集群与网络专项排查
/etc/hosts 为所有节点添加 IP–主机名 双向解析,避免节点名漂移导致 nodedownss -lntp | egrep '4369|25672|5672|15672'nc -vz <peer_ip> 4369、nc -vz <peer_ip> 25672rabbitmqctl cluster_status 检查 running_nodes、partitions 是否为空,必要时按顺序重启各节点五 消息丢失与可靠性核查
durable=true;消息:delivery_mode=2rabbitmqctl set_policy ha-all "^ha\." '{"ha-mode":"all","ha-sync-mode":"automatic"}')