Debian Nginx监控与报警方法

一、基础监控：Nginx自带模块

1. stub_status模块（基础性能监控）
Nginx内置的stub_status模块可提供简单的性能指标（如活跃连接数、请求数、请求处理速率）。配置步骤：

在Nginx配置文件（如/etc/nginx/sites-available/default）的server块中添加：

location /stub_status {
    stub_status on;
    access_log off;
    allow 127.0.0.1;  # 仅允许本地访问
    deny all;
}

重启Nginx：sudo systemctl restart nginx。
访问http://服务器IP/stub_status查看状态（需替换为实际IP）。

二、进阶监控：Prometheus + Grafana（指标可视化与报警）

1. 安装与配置Nginx Exporter
Nginx Exporter是将Nginx指标转换为Prometheus可识别格式的工具。步骤：

下载并安装Exporter：

wget https://github.com/nginxinc/nginx-prometheus-exporter/releases/download/v0.11.0/nginx-prometheus-exporter_0.11.0_linux_amd64.deb
sudo dpkg -i nginx-prometheus-exporter_0.11.0_linux_amd64.deb

启动Exporter（默认监听9113端口）：

sudo systemctl start nginx-prometheus-exporter

2. 配置Prometheus抓取指标
编辑Prometheus配置文件（/etc/prometheus/prometheus.yml），添加Nginx抓取任务：

scrape_configs:
  - job_name: 'nginx'
    scrape_interval: 10s  # 抓取间隔
    static_configs:
      - targets: ['localhost:9113']  # Exporter地址

重启Prometheus：sudo systemctl restart prometheus。

3. Grafana可视化与报警

安装Grafana：sudo apt install grafana，启动后访问http://localhost:3000（默认账号admin/admin）。
添加Prometheus数据源：进入Configuration > Data Sources，选择Prometheus并配置URL为http://localhost:9090。
导入Nginx监控仪表盘：点击+ > Import，搜索“Nginx”并导入官方仪表盘（如ID 6686）。
设置报警规则：进入Alerting > Alert rules，点击New alert rule，选择Nginx指标（如nginx_http_requests_total、nginx_http_up），设置阈值（如请求量超过1000次/分钟触发报警），并配置通知渠道（邮件、Slack等）。

三、日志监控与异常报警

1. 配置Nginx日志格式
在/etc/nginx/nginx.conf中定义结构化日志格式（便于后续分析）：

http {
    log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                    '$status $body_bytes_sent "$http_referer" '
                    '"$http_user_agent" "$http_x_forwarded_for"';
    access_log /var/log/nginx/access.log main;
    error_log /var/log/nginx/error.log warn;  # 错误日志级别设为warn
}

重启Nginx使配置生效。

2. 常用日志分析工具

grep/awk：快速提取错误信息（如404、500错误）。示例：

# 统计404错误数量
awk '$9 == 404 {count++} END {print count}' /var/log/nginx/access.log
# 实时监控错误日志
tail -f /var/log/nginx/error.log | grep -i "error\|fail"

GoAccess：实时可视化日志分析工具。安装：sudo apt install goaccess，运行：
```
goaccess /var/log/nginx/access.log -o report.html --log-format=COMBINED
```
打开report.html查看实时统计（包括错误码、访问来源等）。
ELK Stack（Elasticsearch+Logstash+Kibana）：适用于大规模日志管理。通过Logstash收集Nginx日志，存储到Elasticsearch，用Kibana可视化并设置报警。

3. 自动化报警脚本
编写Shell脚本定期检查日志并发送告警（如邮件）。示例：

#!/bin/bash
ERROR_LOG="/var/log/nginx/error.log"
ALERT_THRESHOLD=5  # 5分钟内超过5次错误触发报警
TIME_WINDOW=$(date -d "5 minutes ago" +"%Y-%m-%d %H:%M:%S")
ERROR_COUNT=$(grep -c "$TIME_WINDOW.*error" "$ERROR_LOG")

if [ "$ERROR_COUNT" -gt "$ALERT_THRESHOLD" ]; then
    echo "Nginx错误日志异常：$ERROR_COUNT次错误（时间窗口：$TIME_WINDOW）" | mail -s "Nginx报警" admin@example.com
fi

添加定时任务（每5分钟执行一次）：

crontab -e
# 添加以下行
*/5 * * * * /path/to/alert_script.sh

4. 第三方监控工具

Fail2Ban：监控Nginx访问日志，自动封禁恶意IP（如频繁404请求）。配置：
```
sudo apt install fail2ban
```
编辑/etc/fail2ban/jail.local，添加Nginx规则：
```
[nginx]
enabled = true
port = http
logpath = /var/log/nginx/access.log
maxretry = 3  # 3次失败后封禁
bantime = 600  # 封禁10分钟
```
重启Fail2Ban：sudo systemctl restart fail2ban。
Zabbix/Nagios：企业级监控工具，支持Nginx状态检查、资源消耗监控（CPU、内存）、自定义报警规则。

四、关键监控指标

性能指标：请求量（nginx_http_requests_total）、活跃连接数（nginx_http_connections_active）、请求延迟（nginx_http_request_duration_seconds）。
错误指标：4xx/5xx错误率（nginx_http_requests_total{status=~"4..|5.."}）、错误日志数量（通过日志分析工具统计）。
资源指标：Nginx进程CPU使用率（top -p $(cat /var/run/nginx.pid)）、内存占用（htop）。

以上方法覆盖了Debian Nginx从基础性能到高级日志、报警的全链路监控需求，可根据实际场景选择合适的工具组合。

Debian Nginx监控与报警方法