CentOS 上 SQL Server 性能监控实战指南
一 监控体系与分层
二 快速检查与常用 SQL
sqlcmd -S <host_or_ip>,<port> -U <user> -P <pwd> -Q "SELECT @@SERVERNAME, @@VERSION, SERVERPROPERTY('ProductVersion') AS ProductVersion"SELECT
r.session_id, r.status, r.command, r.wait_type, r.wait_time, r.cpu_time,
r.logical_reads, r.reads, r.writes,
s.loginame, s.host_name, s.program_name,
SUBSTRING(t.text, (r.statement_start_offset/2)+1,
(CASE WHEN r.statement_end_offset = -1 THEN DATALENGTH(t.text)
ELSE r.statement_end_offset/2 - r.statement_start_offset/2 + 1 END)) AS stmt_text
FROM sys.dm_exec_requests AS r
JOIN sys.sysprocesses AS s ON r.session_id = s.spid
CROSS APPLY sys.dm_exec_sql_text(r.sql_handle) AS t
ORDER BY r.cpu_time DESC;
SELECT
migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,
'CREATE INDEX IX_' + OBJECT_NAME(mid.object_id) + '_'
+ REPLACE(REPLACE(REPLACE(ISNULL(mid.equality_columns,''),', ','_'),'[',''),']','') + '_'
+ REPLACE(REPLACE(REPLACE(ISNULL(mid.inequality_columns,''),', ','_'),'[',''),']','')
+ ' ON ' + mid.statement + ' (' + ISNULL(mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL(mid.inequality_columns, '') + ')'
+ ISNULL(' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
mid.statement, mid.equality_columns, mid.inequality_columns, mid.included_columns
FROM sys.dm_db_missing_index_details AS mid
INNER JOIN sys.dm_db_missing_index_groups AS mig ON mid.index_handle = mig.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats AS migs ON mig.index_group_handle = migs.group_handle
WHERE migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) > 10
ORDER BY improvement_measure DESC;
DBCC SQLPERF(LOGSPACE);
SELECT
(SUM(single_pages_kb) + SUM(multi_pages_kb)) / 1024.0 AS BufferPoolMB,
(SUM(virtual_memory_committed_kb) + SUM(shared_memory_committed_kb)) / 1024.0 AS MemCommittedMB
FROM sys.dm_os_memory_clerks WITH (NOLOCK)
WHERE type LIKE 'MEMORYCLERK_SQLBUFFERPOOL%';
以上 SQL 依赖 DMVs 与 DBCC,适合在 sqlcmd 或 SSMS 中定期执行并留存结果。
三 长期监控与可视化
四 关键指标与告警阈值示例
| 指标 | 含义 | 建议阈值或关注点 |
|---|---|---|
| Page Life Expectancy | 页在缓冲池中的平均生存时间 | 持续低于 300 秒 可能表示内存压力 |
| Buffer Cache Hit Ratio | 缓冲池命中率 | 长期低于 95% 需关注内存与 I/O |
| Batch Requests/s | 每秒批处理请求数 | 结合基线观察突增/骤降 |
| Compiles/s vs Recompiles/s | 编译与重编译 | 重编译比例高可能意味着计划不稳定 |
| Lock Waits / Average Wait Time | 锁等待与平均等待时间 | 持续升高提示阻塞/争用 |
| Log Flush Wait / Log Bytes Flushed/s | 日志刷新等待与吞吐 | 等待高或日志吞吐异常需查日志设备 |
| User Connections | 当前连接数 | 接近最大连接或异常波动需核查应用 |
| Disk Read/Write Latency | 存储延迟 | 超过 10–20 ms(视硬件而定)需排查 I/O |
| CPU Utilization | 服务器 CPU | 持续 >80% 且 SQL 为热点需优化查询/索引 |
| Failed Connections | 失败连接数 | 突增可能意味着认证/网络/服务问题 |
| 上述指标可通过 DMVs、Extended Events、Query Store、DBCC SQLPERF 与系统工具获取,并在 Prometheus/Grafana 或 Zabbix 中设置告警。 |
五 排障流程与优化建议