首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在Terraform中建立基于GCP监视日志的警报?

如何在Terraform中建立基于GCP监视日志的警报?
EN

Stack Overflow用户
提问于 2022-11-10 12:30:32
回答 1查看 63关注 0票数 0

我试着用地形来写一个基于日志的警报策略。

每当日志中出现某条消息时,我希望生成一个几乎实时的警报。具体来说,我想知道作曲家DAG何时失败。

我成功地使用以下查询筛选器在控制台中设置了基于日志的警报:

代码语言:javascript
复制
resource.type="cloud_composer_environment"
severity="ERROR"
log_name="projects/my_project/logs/airflow-scheduler"
resource.labels.project_id="project-id"
textPayload=~"my_dag_name"

但是,我很难将这个基于"google_monitoring_alert_policy".的基于日志的警报策略转换为terraform。

我尝试将以下过滤条件添加到terraform google_monitoring_alert_policy

代码语言:javascript
复制
filter = "resource.type=cloud_composer_environment AND resource.label.project_id=${var.project} AND log_name=projects/${var.project}/logs/airflow-scheduler AND severity=ERROR AND textPayload=~my_dag_name"

但是,在运行terraform apply时,我会得到以下错误:

代码语言:javascript
复制
build   10-Nov-2022 12:21:00    [31mâ[0m [0m[1m[31mError: [0m[0m[1mError creating AlertPolicy: googleapi: Error 400: Field alert_policy.conditions[0].condition_threshold.filter had an invalid value of "resource.type=cloud_composer_environment AND resource.labels.project_id=my_project AND log_name=projects/my_project/logs/airflow-scheduler AND severity=ERROR AND textPayload=my_dag_name": The lefthand side of each expression must be prefixed with one of {group, metadata, metric, project, resource}.[0m

所以我有两个问题:

  1. 完全可以将“基于日志的”警报配置为terraform?

  1. 如何在terraform中设置用于筛选日志'textPayload‘字段中特定字符串的警报?
EN

回答 1

Stack Overflow用户

发布于 2022-11-10 14:13:43

正如我所看到的,您希望创建一个log based metric

在这种情况下,首先需要用Terraform创建这个log based metric

在json文件中配置了度量的示例,logging_metrics.json

代码语言:javascript
复制
{
    "metrics": { 
        "composer_dags_tasks_bigquery_errors": {
            "name": "composer_dags_tasks_bigquery_errors",
            "filter": "severity=ERROR AND resource.type=\"cloud_composer_environment\" AND textPayload =~ \"{taskinstance.py:.*} ERROR -.*bigquery.googleapis.com/bigquery/v2/projects\"",
            "description": "Metric for Cloud Composer Bigquery tasks errors.",
            "metric_descriptor": {
                "metric_kind": "DELTA",
                "value_type": "INT64",
                "labels": [
                    {
                        "key": "task_id",
                        "value_type": "STRING",
                        "description": "Task ID of current Airflow task",
                        "extractor": "EXTRACT(labels.\"task-id\")"
                    },
                    {
                        "key": "execution_date",
                        "value_type": "STRING",
                        "description": "Execution date of the current Airflow task",
                        "extractor": "EXTRACT(labels.\"execution-date\")"
                    }
                ]
            }
        }
    }
}

此度量筛选Composer日志中的Composer错误。我使用label提取器在DAG task_id和Taskexecution_date上根据这些参数使此度量具有唯一性。

检索locals.tf文件中的度量:

代码语言:javascript
复制
locals {
  logging_metrics = jsondecode(file("${path.module}/resource/logging_metrics.json"))["metrics"]
}
代码语言:javascript
复制
resource "google_logging_metric" "logging_metrics" {
  for_each = local.logging_metrics
  project = var.project_id
  name = "${each.value["name"]}"
  filter = each.value["filter"]
  description = each.value["description"]
  metric_descriptor {
    metric_kind = each.value["metric_descriptor"]["metric_kind"]
    value_type = each.value["metric_descriptor"]["value_type"]

    dynamic "labels" {
      for_each = try(each.value["metric_descriptor"]["labels"], [])
      content {
        key = try(labels.value["key"], null)
        value_type = try(labels.value["value_type"], null)
        description = try(labels.value["description"], null)
      }
    }
  }

  label_extractors = {for label in try(each.value["metric_descriptor"]["labels"], []): label.key => label.extractor}
}

然后根据前面的log based metric创建警报资源:

代码语言:javascript
复制
resource "google_monitoring_alert_policy" "alert_policy" {
  project = var.project_id
  display_name = "alert_name"
  combiner = "..."
  conditions {
    display_name = "alert_name"
    condition_threshold {
      filter = "metric.type=\"logging.googleapis.com/user/composer_dags_tasks_bigquery_errors\" AND resource.type=\"cloud_composer_environment\""
      ...........
}

警报策略资源使用上一次通过metric.type创建的metric.type

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74389049

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档