我已经在kubernetes集群(EKS)上部署了prometheus。我用下面的代码成功地抓取了prometheus和traefik
scrape_configs:
# A scrape configuration containing exactly one endpoint to scrape:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# Override the global default and scrape targets from this job every 5 seconds.
scrape_interval: 5s
static_configs:
- targets: ['prometheus.kube-monitoring.svc.cluster.local:9090']
- job_name: 'traefik'
static_configs:
- targets: ['traefik.kube-system.svc.cluster.local:8080']但是,使用以下定义部署为DaemonSet的节点导出器不会公开节点指标。
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-monitoring
spec:
selector:
matchLabels:
app: node-exporter
template:
metadata:
name: node-exporter
labels:
app: node-exporter
spec:
hostNetwork: true
hostPID: true
containers:
- name: node-exporter
image: prom/node-exporter:v0.18.1
args:
- "--path.procfs=/host/proc"
- "--path.sysfs=/host/sys"
ports:
- containerPort: 9100
hostPort: 9100
name: scrape
resources:
requests:
memory: 30Mi
cpu: 100m
limits:
memory: 50Mi
cpu: 200m
volumeMounts:
- name: proc
readOnly: true
mountPath: /host/proc
- name: sys
readOnly: true
mountPath: /host/sys
tolerations:
- effect: NoSchedule
operator: Exists
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys在普罗米修斯中跟随scrape_configs
scrape_configs:
- job_name: 'kubernetes-nodes'
scheme: http
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.kube-monitoring.svc.cluster.local:9100
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics 我还尝试从其中一个容器执行curl http://localhost:9100/metrics,但得到了curl: (7) Failed to connect to localhost port 9100: Connection refused
我在这里的配置缺少什么?
在建议通过helm安装Prometheus之后,我没有在测试集群上安装它,并尝试将我的原始配置与helm安装的Prometheus进行比较。
以下pods正在运行:
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-prometheus-oper-alertmanager-0 2/2 Running 0 4m33s
prometheus-grafana-66c7bcbf4b-mh42x 2/2 Running 0 4m38s
prometheus-kube-state-metrics-7fbb4697c-kcskq 1/1 Running 0 4m38s
prometheus-prometheus-node-exporter-6bf9f 1/1 Running 0 4m38s
prometheus-prometheus-node-exporter-gbrzr 1/1 Running 0 4m38s
prometheus-prometheus-node-exporter-j6l9h 1/1 Running 0 4m38s
prometheus-prometheus-oper-operator-648f9ddc47-rxszj 1/1 Running 0 4m38s
prometheus-prometheus-prometheus-oper-prometheus-0 3/3 Running 0 4m23s我在/etc/prometheus/prometheus.yml的pod prometheus-prometheus-prometheus-oper-prometheus-0中找不到节点导出器的任何配置
发布于 2019-07-10 15:39:16
之前关于使用Helm的建议是非常有效的,我也推荐使用Helm。
关于您的问题:问题是您不是直接抓取节点,而是使用node-exporter。所以role: node是不正确的,你应该改用role: endpoints。为此,您还需要为DaemonSet的所有pod创建服务。
以下是我的环境中的工作示例(由Helm安装):
- job_name: monitoring/kube-prometheus-exporter-node/0
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
separator: ;
regex: exporter-node
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replace发布于 2019-07-10 04:04:09
你是怎么部署普罗米修斯的?每当我使用helm-chart (https://github.com/helm/charts/tree/master/stable/prometheus)时,都会部署节点导出器。也许这是一个更简单的解决方案。
发布于 2021-02-16 10:23:45
我被困在了类似的地方。但在这里,我的节点导出器不是helm部署的一部分,因为我们从Tanzu grid(k8s集群)获得了附加节点导出器。因此,我已经创建了服务监视器,现在我可以看到服务发现,计数应该是正确的。但在目标部分,它说的是0/4计数。不能看到节点的指标,但当我可以卷曲localhost:9100/metrics时,我可以看到数据。有些地方我忽略了逻辑。
我检查了helm部署的节点导出器数据,它看起来是一样的,但是我在这里遗漏了什么?
请忽略缩进,因为在手机中复制粘贴时会遗漏缩进。
- job_name: node-exporter
scrape_interval: 15s
scrape_timeout: 10s
metrics_path: /metrics
scheme: http
kubernetes_sd_configs:
- role: endpoints
namespaces:
names:
- monitoring
relabel_configs:
- source_labels:
[__meta_kubernetes_service_label_app]
separator: ;
regex: exporter-node
replacement: $1
action: keep
- source_labels:
[__meta_kubernetes_endpoint_port_name]
separator: ;
regex: metrics
replacement: $1
action: keep
- source_labels: [__meta_kubernetes_namespace]
separator: ;
regex: (.*)
target_label: namespace
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_pod_name]
separator: ;
regex: (.*)
target_label: pod
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: service
replacement: $1
action: replace
- source_labels: [__meta_kubernetes_service_name]
separator: ;
regex: (.*)
target_label: job
replacement: ${1}
action: replace
- separator: ;
regex: (.*)
target_label: endpoint
replacement: metrics
action: replacehttps://stackoverflow.com/questions/56959545
复制相似问题