Apache日志流量统计可通过以下方式实现:
基础命令行统计
awk '{print $1}' access.log | sort | uniq -c | sort -nr(提取IP并排序统计)。grep "2025:08:06" access.log | wc -l(按日期筛选后计数)。awk '{print $7}' access.log | sort | uniq -c | sort -nr | head -10(提取URL并排序)。工具化分析
apachetop:实时显示请求、响应等流量数据。mod_status:启用后通过http://服务器IP/server-status查看实时访问量等状态。编程脚本分析
import re
from collections import Counter
import pandas as pd
log_pattern = re.compile(r'(?P<ip>\d+\.\d+\.\d+\.\d+) - - \[(?P<datetime>[^\]]+)\] "(?P<method>\w+) (?P<path>[^\s]+) (?P<protocol>[^"]+)" (?P<status>\d+)')
with open('access.log') as f:
logs = [match.groupdict() for line in f for match in [log_pattern.match(line)] if match]
# 转换为DataFrame分析
df = pd.DataFrame(logs)
df['datetime'] = pd.to_datetime(df['datetime'], format='%d/%b/%Y:%H:%M:%S %z')
hourly_traffic = df.resample('H', on='datetime').size() # 按小时统计访问量
print(hourly_traffic)
注意事项
以上方法可按需选择,从简单命令到复杂可视化,满足不同场景的流量统计需求。