我希望能够指定我的所有规则,比如prometheus-blackbox-exporter,因此将其添加到rules-mine.yaml中,并将其部署到
helm upgrade --install -n monitoring blackbox -f values.yaml -f rules-mine.yaml .我看不到http://localhost:9090/rules中列出的任何规则,似乎没有任何东西被评估为没有警报.我需要像IaC那样做所有事情,并以自动化的方式通过terraform进行部署。
efficiently?
rules-mine.yaml文件包含:
prometheusRule:
enabled: true
namespace: monitoring
additionalLabels:
team: foxtrot_blackbox
environment: production
cluster: cluster
namespace: namespace_x
namespace: "monitoring"
rules:
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 0m
labels:
severity: critical
annotations:
summary: Blackbox probe failed (instance {{`{{`}} $labels.instance {{`}}`}})
description: "Probe failed\n VALUE = {{`{{`}} $value {{`}}`}}"
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 1m
labels:
severity: warning
annotations:
summary: Blackbox slow probe (instance {{`{{`}} $labels.instance {{`}}`}})
description: "Blackbox probe took more than 1s to complete\n VALUE = {{`{{`}} $value {{`}}`}}"谢谢你的帮助..。
发布于 2021-12-20 05:45:43
一位同事发现这是完全可能的。这似乎与最初实现中使用的引用有关。下面是正在使用和工作,所以在这里张贴,希望它将对其他人有用。
总而言之,
{{`{{`}} $labels.instance {{`}}`}} == BAD{{`{{$labels.instance}}`}} == GOODprometheusRule:
enabled: true
additionalLabels:
client: ${client_id}
cluster: ${cluster}
environment: ${environment}
grafana: ${grafana_url}
rules:
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: Blackbox probe failed for {{`{{$labels.instance}}`}}
description: Probe failed VALUE = {{`{{$value}}`}}
dashboard_url: https://${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{`{{$labels.instance}}`}}
runbook_url: ${wiki_url}/BlackboxProbeFailed
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 2m
labels:
severity: warning
annotations:
summary: Blackbox slow probe for {{`{{$labels.instance}}`}}
description: Blackbox probe took more than 1s to complete VALUE = {{`{{$value|humanizeDuration}}`}}
dashboard_url: https://${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{`{{$labels.instance}}`}}
runbook_url: ${wiki_url}/BlackboxSlowProbe请忽略任何缺少的变量,等等。
发布于 2021-11-18 00:53:58
我发现最好的方法似乎是将导出规则添加到kube-prometheus-stack values.yaml文件中(实际上我创建了一个单独的rules.yaml文件),并将其提供给helm:
helm upgrade --install -n monitoring prometheus --create-namespace -f values-mine.yaml -f rules-mine.yaml prometheus-community/kube-prometheus-stack然后,所有的规则都会像我想要的那样被采纳,并且似乎是一个好的解决方案。但我还是希望他们与出口商分组-如果我找到了解决办法,我会再次发帖。
additionalPrometheusRulesMap:
prometheus.rules:
groups:
- name: company.prometheus.rules
rules:
- alert: PrometheusNotificationsBacklog
expr: min_over_time(prometheus_notifications_queue_length[10m]) > 0
for: 0m
labels:
severity: warning
annotations:
summary: Prometheus notifications backlog (instance {{ $labels.instance }})
description: The Prometheus notification queue has not been empty for 10 minutes\nVALUE = {{ $value }}
dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
runbook_url: ${wiki_url}/{{ $labels.alertname }}
company.blackbox.rules:
groups:
- name: company.blackbox.rules
rules:
- alert: BlackboxProbeFailed
expr: probe_success == 0
for: 1m
labels:
severity: critical
annotations:
summary: Blackbox probe failed (instance {{ $labels.instance }})
description: Probe failed\nVALUE = {{ $value }}
dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
runbook_url: ${wiki_url}/{{ $labels.alertname }}
- alert: BlackboxSlowProbe
expr: avg_over_time(probe_duration_seconds[1m]) > 1
for: 3m
labels:
severity: warning
annotations:
summary: Blackbox slow probe (instance {{ $labels.instance }})
description: "Blackbox probe took more than 1s to complete\nVALUE = {{ $value }}"
dashboard_url: ${grafana_url}/d/blackbox/blackbox-exporter?var-instance={{ $labels.instance }}
runbook_url: ${wiki_url}/{{ $labels.alertname }}
# etc....发布于 2021-10-29 15:32:25
您确定您没有在标签名中输入一个错误:"environmment“?这肯定不符合您的预期,除非您实际上标记您的来源。
最好的
https://stackoverflow.com/questions/69702163
复制相似问题