利用 Debian 上的 Nginx 日志进行负载均衡优化
一 关键日志字段与采集配置
http {
log_format lb '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'"$http_x_forwarded_for" '
'upstream=$upstream_addr '
'rt=$request_time '
'urt=$upstream_response_time';
access_log /var/log/nginx/access.log lb;
error_log /var/log/nginx/error.log warn;
upstream backend {
least_conn;
server 10.0.1.11:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.12:8080 max_fails=3 fail_timeout=30s;
server 10.0.1.13:8080 max_fails=3 fail_timeout=30s;
}
server {
listen 80;
location / {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
}
}
stream {
log_format streamlb escape=json
'{'
'"timestamp":"$time_iso8601",'
'"remote_addr":"$remote_addr",'
'"protocol":"$protocol",'
'"status":"$status",'
'"bytes_sent":"$bytes_sent",'
'"bytes_received":"$bytes_received",'
'"session_time":"$session_time",'
'"upstream_addr":"$upstream_addr",'
'"upstream_connect_time":"$upstream_connect_time"'
'}';
access_log /var/log/nginx/stream-access.log streamlb;
# 也可转发到 Graylog/RSYSLOG
# access_log syslog:server=10.10.0.10:514 streamlb;
upstream graylog_servers {
server 10.10.0.11:9000;
server 10.10.0.12:9000 backup;
}
server {
listen 9000;
proxy_pass graylog_servers;
}
}
/var/log/nginx/*.log {
daily
rotate 7
compress
delaycompress
missingok
notifempty
create 640 root adm
postrotate
invoke-rc.d nginx rotate >/dev/null 2>&1
endscript
}
二 日志驱动的优化闭环
# 5xx 错误
grep ' 5[0-9][0-9] ' /var/log/nginx/access.log | wc -l
# Top 10 最慢请求(基于 upstream_response_time,字段位置依格式而定)
sort -k12 -nr /var/log/nginx/access.log | head -10
# Top 10 后端与平均上游响应时间
awk '{up[$upstream_addr]+=$upstream_response_time; cnt[$upstream_addr]++}
END {for (h in up) printf "%s %.3f\n", h, up[h]/cnt[h]}' \
/var/log/nginx/access.log | sort -k2 -nr | head
# 每分钟请求量 Top10(按日志时间字段截取)
awk '{t=substr($4,2,5); a[t]++} END {for (i in a) print i,a[i]}' \
/var/log/nginx/access.log | sort -k2 -nr | head
goaccess /var/log/nginx/access.log -o /var/www/report.html --log-format=COMBINED
三 常见场景与日志判据
| 场景 | 日志判据 | 优化动作 |
|---|---|---|
| 后端性能不均 | 同一 $upstream_addr 的 $upstream_response_time 持续偏高 | 为该后端设置更低 weight 或暂时摘除;检查慢查询/慢接口 |
| 长连接/耗时请求多 | $request_time 与 $upstream_response_time 差异小但整体偏高 | 使用 least_conn;开启 upstream keepalive;优化后端处理 |
| 错误突发 | 5xx 或 upstream connect timeout 在短时间激增 | 降低 max_fails 触发阈值或延长 fail_timeout;扩容后端或限流 |
| 会话保持需求 | 登录态依赖 IP 或特定标识 | 采用 ip_hash 或引入 sticky/集中会话存储(如 Redis) |
| 峰值流量不均 | 每分钟请求量曲线尖峰、部分后端空闲 | 调整 weight 与 least_conn;增加 server 节点;启用缓存/CDN |
| 这些判据依赖前述日志字段,能快速定位瓶颈并指导参数优化。 |
四 运维与合规要点