Prometheus 监控之 zookeeper 末蓝、 2022-04-15 06:38 574阅读 0赞 #### zookeeper 监控 #### git项目地址:[https://github.com/jiankunking/zookeeper\_exporter][https_github.com_jiankunking_zookeeper_exporter] exporter下载地址:[https://github.com/carlpett/zookeeper\_exporter/releases/download/v1.0.2/zookeeper\_exporter][https_github.com_carlpett_zookeeper_exporter_releases_download_v1.0.2_zookeeper_exporter] [sss@keeper01 ~]$ /usr/local/bin/zookeeper_exporter --help Usage of /usr/local/bin/zookeeper_exporter: -bind-addr string bind address for the metrics server (default ":9141") -log-level string log level (default "info") -metrics-path string path to metrics endpoint (default "/metrics") -reset-on-scrape should a reset command be sent to zookeeper on each scrape (default true) -version show version and exit -zookeeper string host:port for zookeeper socket (default "localhost:2181") [sss@keeper01 ~]$ /usr/local/bin/zookeeper_exporter [sss@keeper01 ~]$ curl localhost:9141/metrics # HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{ quantile="0"} 3.1513e-05 go_gc_duration_seconds{ quantile="0.25"} 4.2555e-05 go_gc_duration_seconds{ quantile="0.5"} 4.9278e-05 go_gc_duration_seconds{ quantile="0.75"} 8.2042e-05 go_gc_duration_seconds{ quantile="1"} 0.000212331 go_gc_duration_seconds_sum 0.002286821 go_gc_duration_seconds_count 31 # HELP go_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 14 …… ZooKeeper 提供了四字命令(The Four Letter Words),用来获取 ZooKeeper 服务的当前状态及相关信息。 有哪些命令可以使用? ZooKeeper四字命令 功能描述 conf 打印配置 cons 列出所有连接到这台服务器的客户端全部连接/会话详细信息。包括"接受/发送"的包数量、会话id、操作延迟、最后的操作执行等等信息。 crst 重置所有连接的连接和会话统计信息。 dump 列出那些比较重要的会话和临时节点。这个命令只能在leader节点上有用。 envi 打印出服务环境的详细信息。 reqs 列出未经处理的请求 ruok 即"Are you ok",测试服务是否处于正确状态。如果确实如此,那么服务返回"imok",否则不做任何相应。 stat 输出关于性能和连接的客户端的列表。 srst 重置服务器的统计。 srvr 列出连接服务器的详细信息 wchs 列出服务器watch的详细信息。 wchc 通过session列出服务器watch的详细信息,它的输出是一个与watch相关的会话的列表。 wchp 通过路径列出服务器watch的详细信息。它输出一个与session相关的路径。 mntr 输出可用于检测集群健康状态的变量列表 可以在客户端可以通过 telnet 或 nc 向 ZooKeeper 提交相应的命令。举个最常用的栗子: echo mntr | nc ip 2181 ###### 指标名 解释 ###### <table> <thead> <tr> <th>指标名</th> <th>解释</th> </tr> </thead> <tbody> <tr> <td>zk_version</td> <td>版本</td> </tr> <tr> <td>zk_avg_latency</td> <td>平均 响应延迟</td> </tr> <tr> <td>zk_max_latency</td> <td>最大 响应延迟</td> </tr> <tr> <td>zk_min_latency</td> <td>最小 响应延迟</td> </tr> <tr> <td>zk_packets_received</td> <td>收包数</td> </tr> <tr> <td>zk_packets_sent</td> <td>发包数</td> </tr> <tr> <td>zk_num_alive_connections</td> <td>活跃连接数</td> </tr> <tr> <td>zk_outstanding_requests</td> <td>堆积请求数</td> </tr> <tr> <td>zk_server_state</td> <td>主从状态</td> </tr> <tr> <td>zk_znode_count</td> <td>znode 数</td> </tr> <tr> <td>zk_watch_count</td> <td>watch 数</td> </tr> <tr> <td>zk_ephemerals_count</td> <td>临时节点数</td> </tr> <tr> <td>zk_approximate_data_size</td> <td>近似数据总和大小</td> </tr> <tr> <td>zk_open_file_descriptor_count</td> <td>打开 文件描述符 数</td> </tr> <tr> <td>zk_max_file_descriptor_count</td> <td>最大 文件描述符 数</td> </tr> <tr> <td>leader才有的指标</td> <td></td> </tr> <tr> <td>zk_followers</td> <td>Follower 数</td> </tr> <tr> <td>zk_synced_followers</td> <td>已同步的 Follower 数</td> </tr> <tr> <td>zk_pending_syncs</td> <td>阻塞中的 sync 操作</td> </tr> </tbody> </table> ##### 需要指定阈值的指标 ##### zk\_outstanding\_requests 堆积请求数 zk\_pending\_syncs 阻塞中的 sync 操作 zk\_avg\_latency 平均 响应延迟 zk\_open\_file\_descriptor\_count 打开 文件描述符 数 zk\_max\_file\_descriptor\_count 最大 文件描述符 数 zk\_up 1 zk\_server\_state 主从状态 zk\_num\_alive\_connections 活跃连接数 ##### rule文件(仅供参考): ##### groups: - name: zookeeperStatsAlert rules: - alert: 堆积请求数过大 expr: avg(zk_outstanding_requests) by (instance) > 10 for: 1m labels: severity: critical annotations: summary: "Instance { { $labels.instance }} " description: "积请求数过大" - alert: 阻塞中的 sync 过多 expr: avg(zk_pending_syncs) by (instance) > 10 for: 1m labels: severity: critical annotations: summary: "Instance { { $labels.instance }} " description: "塞中的 sync 过多" - alert: 平均响应延迟过高 expr: avg(zk_avg_latency) by (instance) > 10 for: 1m labels: severity: critical annotations: summary: "Instance { { $labels.instance }} " description: '平均响应延迟过高' - alert: 打开文件描述符数大于系统设定的大小 expr: zk_open_file_descriptor_count > zk_max_file_descriptor_count * 0.85 for: 1m labels: severity: critical annotations: summary: "Instance { { $labels.instance }} " description: '打开文件描述符数大于系统设定的大小' - alert: zookeeper服务器宕机 expr: zk_up == 0 for: 1m labels: severity: critical annotations: summary: "Instance { { $labels.instance }} " description: 'zookeeper服务器宕机' - alert: zk主节点丢失 expr: absent(zk_server_state{ state="leader"}) != 1 for: 1m labels: severity: critical annotations: summary: "Instance { { $labels.instance }} " description: 'zk主节点丢失' ##### Grafana画图: ##### 在Grafana分享了zk监控图:搜索Zookeeper Exporer Overview 或者 拷贝pid 9236 ![在这里插入图片描述][watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzI1OTM0NDAx_size_16_color_FFFFFF_t_70] #### kafka 监控 #### git项目地址:[https://github.com/danielqsj/kafka\_exporter][https_github.com_danielqsj_kafka_exporter] 下载地址: [https://github.com/danielqsj/kafka\_exporter/releases/download/v1.2.0/kafka\_exporter-1.2.0.linux-amd64.tar.gz][https_github.com_danielqsj_kafka_exporter_releases_download_v1.2.0_kafka_exporter-1.2.0.linux-amd64.tar.gz] 启动 kafka_exporter --kafka.server=kafka:9092 [--kafka.server=another-server ...] [https_github.com_jiankunking_zookeeper_exporter]: https://github.com/jiankunking/zookeeper_exporter [https_github.com_carlpett_zookeeper_exporter_releases_download_v1.0.2_zookeeper_exporter]: https://github.com/carlpett/zookeeper_exporter/releases/download/v1.0.2/zookeeper_exporter [watermark_type_ZmFuZ3poZW5naGVpdGk_shadow_10_text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L3FxXzI1OTM0NDAx_size_16_color_FFFFFF_t_70]: /images/20220415/331db7f976bc4e67ad8c17e03c5197aa.png [https_github.com_danielqsj_kafka_exporter]: https://github.com/danielqsj/kafka_exporter [https_github.com_danielqsj_kafka_exporter_releases_download_v1.2.0_kafka_exporter-1.2.0.linux-amd64.tar.gz]: https://github.com/danielqsj/kafka_exporter/releases/download/v1.2.0/kafka_exporter-1.2.0.linux-amd64.tar.gz
还没有评论,来说两句吧...