我们正在运行PMMv1.17.0,prometheus正在造成巨大的cpu和mem使用量(200%的CPU和100%的RAM),而PMM因此而下降。我们正在一个带有2vCPU和7.5G RAM的VM上运行PMM,并且正在监视大约25台服务器。PMM使用以下命令>>运行
docker run -d -it --volumes-from pmm-data --name pmm-server -e QUERIES_RETENTION=1095 -p 80:80 -e METRICS_RESOLUTION=3s --restart always percona/pmm-server:1prometheus.log中填充了以下条目:
level=warn ts=2020-01-30T10:27:12.8156514Z caller=scrape.go:713 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="append failed" err="out of order sample"
level=warn ts=2020-01-30T10:27:26.464361371Z caller=scrape.go:945 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.223:42002/metrics-mr msg="Error on ingesting samples with different value but same timestamp" num_dropped=1
level=warn ts=2020-01-30T10:27:27.81316996Z caller=scrape.go:942 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="Error on ingesting out-of-order samples" num_dropped=2
level=warn ts=2020-01-30T10:27:27.813257165Z caller=scrape.go:713 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="append failed" err="out of order sample"
level=warn ts=2020-01-30T10:27:41.462420708Z caller=scrape.go:945 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.223:42002/metrics-mr msg="Error on ingesting samples with different value but same timestamp" num_dropped=1
level=warn ts=2020-01-30T10:27:42.813356387Z caller=scrape.go:942 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="Error on ingesting out-of-order samples" num_dropped=2
level=warn ts=2020-01-30T10:27:42.813441108Z caller=scrape.go:713 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="append failed" err="out of order sample"
level=warn ts=2020-01-30T10:27:56.463798729Z caller=scrape.go:945 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.223:42002/metrics-mr msg="Error on ingesting samples with different value but same timestamp" num_dropped=1
level=warn ts=2020-01-30T10:27:57.82083775Z caller=scrape.go:942 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="Error on ingesting out-of-order samples" num_dropped=2
level=warn ts=2020-01-30T10:27:57.820912309Z caller=scrape.go:713 component="scrape manager" scrape_pool=mysql-mr target=https://10.40.4.21:42002/metrics-mr msg="append failed" err="out of order sample"谁能告诉我普罗米修斯为什么要引起争论吗?我们需要添加/更改哪些参数?
发布于 2021-06-10 20:54:09
您监视了多少台服务器?该规范的PMM服务器可以处理4-8个受监视的服务器,如果它们不太忙的话。更接近4,如果他们很忙,并发送大量的查询到PMM的QAN。这也取决于您的数据保留,如果您增加保留从默认情况下,您将需要添加更多的RAM和CPU到主机。
发布于 2020-02-01 19:29:36
100%内存--你可能是在交换,这对性能来说很糟糕。降低一些innodb_buffer_pool_size以避免交换。
200% CPU --索引和/或查询格式不佳。请提供一些查询和SHOW CREATE TABLE;可能有一个快速修复。
“无序”和“不同的值”--要么是具有收集机制的bug,要么是Percona中的bug。
https://dba.stackexchange.com/questions/258517
复制相似问题