Linux 文本提取与搜索效率提升
一 核心工具与高效选项
grep -F "ERROR" app.log 比默认正则更快。find /var/log -type f -name "*.log" -mtime -7 -exec grep -l "timeout" {} +。zgrep -A5 -B2 "503" /var/log/nginx/access.log.*.gz,避免先解压。cut -d',' -f1,3 data.csv;awk -F',' '$2 > 100 {print $1,$3}' data.csv。grep "ERROR" app.log | awk '{ip[$1]++} END {for (i in ip) print i, ip[i]}' | sort -nr | head,减少多次管道与子进程。二 提速策略与并行化
find /var/log -name "*.log" | xargs -P 8 -n 100 grep -F "timeout"(-P 为并发进程数)。parallel -j 8 grep -F "timeout" ::: /var/log/**/*.log。.*滥用。三 常用场景的高效命令模板
zgrep -A5 -B2 "503" /var/log/nginx/access.log.*.gz | less -S。find /var/log -type f -name "*.log" -mtime -7 -exec grep -F "timeout" {} + | awk '{ip[$1]++} END {for (i in ip) print i, ip[i]}' | sort -nr | head。find . -type f -name "*.conf" -exec grep -l "Listen 443" {} +。grep -w "ERROR" app.log | wc -l。grep -oE "\b([0-9]{1,3}\.){3}[0-9]{1,3}\b" access.log | sort -u。grep -vE "DEBUG|INFO" app.log | grep -F "ERROR"。四 进阶工具与索引加速
rg "TODO" --type js。updatedb,用locate快速找文件,再 grep 内容:updatedb && locate "*.log" | xargs grep -l "pattern"(适合频繁搜索的目录)。五 避免常见性能陷阱
cat file | grep pattern,直接用grep pattern file减少一次不必要进程与管道。.*,降低回溯成本。