Elasticsearch日志分析与监控实践指南
日志收集是分析与监控的基础,需将应用、系统、网络等日志从不同来源汇总到Elasticsearch。常用工具如下:
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/nginx/*.log
output.elasticsearch:
hosts: ["localhost:9200"]
%{COMBINEDAPACHELOG}模式)、过滤(如去除敏感字段)。配置示例(解析Nginx日志并输出到ES):input { file { path => "/var/log/nginx/*.log" start_position => "beginning" } }
filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } }
output { elasticsearch { hosts => ["localhost:9200"] index => "nginx-logs-%{+YYYY.MM.dd}" } }
curl -XPOST 'localhost:9200/logs/_doc' -H 'Content-Type: application/json' -d '{"message":"test log"}')。合理的索引设计能提升日志存储效率与查询速度,关键要点包括:
nginx-logs-*索引设置分片数3、副本数1,字段timestamp为date类型):PUT _index_template/nginx_logs_template
{
"index_patterns": ["nginx-logs-*"],
"template": {
"settings": {"number_of_shards": 3, "number_of_replicas": 1},
"mappings": {"properties": {"timestamp": {"type": "date"}, "message": {"type": "text"}}}
}
}
PUT _ilm/policy/nginx_logs_policy
{
"policy": {
"phases": {
"hot": {"actions": {"rollover": {"max_age": "7d", "max_size": "50gb"}}},
"warm": {"actions": {"allocate": {"number_of_replicas": 2}}},
"delete": {"actions": {"delete": {"min_age": "30d"}}}
}
}
}
通过Elasticsearch的查询DSL与聚合功能,可实现日志的深度分析与模式识别:
host为server01的日志):GET /nginx-logs-*/_search
{
"query": {"match": {"host": "server01"}}
}
GET /nginx-logs-*/_search
{
"aggs": {"logs_over_time": {"date_histogram": {"field": "timestamp", "interval": "hour"}}}
}
监控是预防故障的关键,需覆盖集群健康、资源使用、索引性能等维度:
_cluster/health返回green/yellow/red)、节点列表(_cat/nodes?v)、索引信息(_cat/indices?v)。_nodes/hot_threads显示占用CPU高的线程堆栈)。elasticsearch.yml的xpack.monitoring.collection.enabled: true,然后在Kibana中进入“Stack Monitoring”查看。告警是监控的延伸,需针对关键指标设置阈值,及时通知运维人员:
nginx-logs-*的索引延迟超过1秒”),支持邮件、Slack、PagerDuty等通知渠道。elasticsearch_cluster_status{status="red"} == 1),通过Alertmanager发送告警。cluster.health.status(GREEN=正常,YELLOW=副本未分配,RED=分片未分配)。_cat/nodes中的节点数(避免节点丢失导致数据不可用)。jvm.memory.heap.used.percent(堆内存使用率,>75%告警,>85%紧急)。fs.disk.avail(剩余空间,<20%低水位,<10%紧急)。indices.indexing.index_time_avg(平均索引时间,SSD>1s/HDD>3s需优化)。indices.search.query_time_avg(平均搜索时间,>500ms影响用户体验)。thread_pool.write.rejected(写线程池拒绝次数,>0表示写入压力过大)。免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。