ubuntu上js应用如何监控

Ubuntu 上 JS 应用监控实操指南

一监控目标与总体架构

日志：捕获 console 与 未捕获异常，按级别输出到文件与系统日志，便于检索与告警。
进程与可用性：进程存活、自动重启、端口与响应可达性。
指标与性能：请求延迟、吞吐、错误率、内存/CPU、GC、事件循环等。
错误追踪：线上异常堆栈聚合、按环境/版本分组、链路上下文。
可视化与告警：用 Grafana 看板、Prometheus 采集、Alertmanager 或系统工具发告警。

二快速落地步骤

进程与日志
- 使用 PM2 托管应用并统一日志：安装 sudo npm i -g pm2；启动 pm2 start app.js -n my-app --log /var/log/my-app/app.log；实时查看 pm2 logs my-app -f；资源监控 pm2 monit；设置开机自启 pm2 startup && pm2 save。
- 使用 systemd 托管：创建 /etc/systemd/system/my-app.service，设置 ExecStart=/usr/bin/node /path/app.js、Restart=always，然后用 sudo systemctl start my-app、sudo journalctl -u my-app -f 查看日志。
日志实时查看与轮转
- 实时跟踪日志：tail -f /var/log/my-app/app.log；定期查看：watch -n 5 cat /var/log/my-app/app.log。
- 日志轮转与报告：配置 logrotate 按日切分并压缩；使用 logwatch 生成日报/周报并配合 cron 定时发送。
指标与可视化
- 在 Node.js 中引入 prom-client 暴露 /metrics，用 Prometheus 抓取并接入 Grafana 做可视化与阈值告警。
错误追踪
- 接入 Sentry（或 New Relic/Datadog）收集线上异常堆栈与上下文，便于快速定位。

三关键配置示例

PM2 日志与启动

sudo npm i -g pm2
pm2 start app.js -n my-app --log /var/log/my-app/app.log
pm2 logs my-app -f
pm2 monit
pm2 startup && pm2 save

systemd 服务

sudo tee /etc/systemd/system/my-app.service >/dev/null <<'EOF'
[Unit]
Description=Node.js App
After=network.target

[Service]
Type=simple
User=ubuntu
WorkingDirectory=/opt/my-app
ExecStart=/usr/bin/node /opt/my-app/app.js
Restart=always
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now my-app
sudo journalctl -u my-app -f

Winston 结构化日志

npm i winston

// logger.js
const { createLogger, format, transports } = require('winston');
const logger = createLogger({
  level: 'info',
  format: format.combine(format.timestamp(), format.json()),
  transports: [
    new transports.File({ filename: 'error.log', level: 'error' }),
    new transports.File({ filename: 'combined.log' }),
    new transports.Console({ format: format.simple() })
  ]
});
module.exports = logger;

Sentry 错误追踪

npm i @sentry/node

// app.js
const Sentry = require('@sentry/node');
Sentry.init({ dsn: process.env.SENTRY_DSN, environment: 'production' });
// 全局异常兜底
process.on('unhandledRejection', e => Sentry.captureException(e));
process.on('uncaughtException', e => { Sentry.captureException(e); process.exit(1); });

Prometheus 指标端点

npm i prom-client

// metrics.js
const client = require('prom-client');
const http = require('http');
const register = client.register;

const httpReqDur = new client.Histogram({
  name: 'http_request_duration_ms',
  help: 'Duration of HTTP requests in ms',
  labelNames: ['method', 'route', 'code'],
  buckets: [5, 15, 50, 100, 200, 300, 400, 500]
});

const server = http.createServer((req, res) => {
  const end = httpReqDur.startTimer();
  res.on('finish', () => {
    end({ method: req.method, route: req.url, code: res.statusCode });
  });
  res.end('OK');
});

server.listen(3000, () => console.log('Metrics on :3000/metrics'));
// Prometheus 抓取目标示例：scrape_configs: job_name: 'nodejs' static_configs: targets: ['localhost:3000']

logrotate 轮转示例

sudo tee /etc/logrotate.d/my-app >/dev/null <<'EOF'
/var/log/my-app/*.log {
  daily
  missingok
  rotate 14
  compress
  delaycompress
  notifempty
  create 0644 ubuntu ubuntu
  sharedscripts
  postrotate
    systemctl reload my-app >/dev/null 2>&1 || true
  endscript
}
EOF

四可视化与告警

Prometheus + Grafana：部署 Prometheus 与 Grafana，在 Grafana 添加 Prometheus 数据源，导入或自建 Node.js 仪表盘，基于 http_request_duration_ms、进程内存/CPU 等指标设置阈值告警（可用 Alertmanager 或外部 webhook）。
系统与应用性能总览：部署 NetData，浏览器访问 http://:19999 查看主机与应用层实时指标，适合快速排障与容量评估。
可用性监控：部署 Uptime Kuma，监控 HTTP(s)/TCP/Ping 等，支持多通知渠道与状态页，弥补仅看资源指标的不足。

五排障与优化建议

开发期定位：使用 node --inspect app.js，在 Chrome DevTools 的 chrome://inspect 进行 CPU/内存采样与火焰图分析；配合 autocannon/wrk/Artillery 做负载与稳定性压测。
运行期观测：在代码中埋点与暴露 /metrics，结合 Prometheus/Grafana 观察 P95/P99 延迟、吞吐与错误率趋势，联动告警。
资源瓶颈：用 htop/vmstat/iostat/sar（来自 sysstat）排查 CPU、内存、磁盘 I/O 与系统层面瓶颈，必要时优化查询、缓存与并发策略。

最新问答

相关标签