识别思路与前提
基于 Tomcat 访问日志的识别
<Valve className="org.apache.catalina.valves.AccessLogValve"
directory="logs" prefix="localhost_access_log" suffix=".txt"
pattern="%h %l %u %t "%r" %s %b %D" />
其中 %D 便于以毫秒设定阈值(如 >800 ms)。# 单文件
grep 'QTime' /var/log/tomcat/localhost_access_log.*.txt | \
awk -F'QTime:' '{if ($2+0 > 800) print $0}'
# 若使用 %D(毫秒)
awk '$NF > 800 {print}' /var/log/tomcat/localhost_access_log.*.txt
# 若使用 %T(秒,阈值换算为 0.8)
awk '$NF > 0.8 {print}' /var/log/tomcat/localhost_access_log.*.txt
# 近 1 小时、按 URL 聚合,显示次数与最大耗时
grep "$(date -d '1 hour ago' '+%d/%b/%Y:%H')" /var/log/tomcat/localhost_access_log.*.txt | \
awk -F'QTime:' '{t=$2+0; if(t>800) print $0}' | \
awk '{url=$7; gsub(/^\?.*/, "", url); dur[url]+=t; cnt[url]++} END {for(u in dur) print dur[u], cnt[u], u}' | \
sort -nr | head
tail -f /var/log/tomcat/localhost_access_log.*.txt | \
awk -F'QTime:' '{if(($2+0) > 800) print strftime("%F %T"), $0}'
基于数据库与应用日志的识别
slow_query_log=1
slow_query_log_file=/var/log/mysql/mysql-slow.log
long_query_time=1
log_queries_not_using_indexes=1
pt-query-digest /var/log/mysql/mysql-slow.log --limit 10
连接池问题导致的“慢”识别与处置
# 实时观察
tail -f /var/log/tomcat/catalina.out | \
egrep -i "connection pool|getconnectiontimeout|cannot get connection|connection leak"
# 最近 1 小时
egrep -i "connection pool" /var/log/tomcat/catalina.out | \
grep "$(date -d '1 hour ago' '+%b %d %H:%M')"
<Resource name="jdbc/mydb" auth="Container" type="javax.sql.DataSource"
maxTotal="200" maxIdle="50" maxWaitMillis="3000"
removeAbandoned="true" removeAbandonedTimeout="300" logAbandoned="true"
testOnBorrow="true" validationQuery="SELECT 1" validationQueryTimeout="5"
minIdle="10" timeBetweenEvictionRunsMillis="60000"
minEvictableIdleTimeMillis="300000" />
调整后重启 Tomcat,并通过 Tomcat Manager 或 JMX 观察 active/idle/waiting 等指标变化。自动化与长期治理建议