首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何获取glue爬虫事件状态?

如何获取glue爬虫事件状态?
EN

Stack Overflow用户
提问于 2019-07-26 12:46:12
回答 2查看 1.9K关注 0票数 0

我正在按照这个文档https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/crawler完成时在lambda上设置一个自动触发器。我在cloudwatch上设置的事件模式是:

代码语言:javascript
复制
{
  "detail": {
    "crawlerName": [
      "reddit_movie"
    ],
    "state": [
      "Succeeded"
    ]
  },
  "detail-type": [
    "Glue Crawler State Change"
  ],
  "source": [
    "aws.glue"
  ]
}

我在cloudwatch中添加了一个lambda函数作为此规则的目标。

我手动触发了爬虫,但在爬虫完成后,它不会触发lambda。从爬虫日志中,我可以看到:

代码语言:javascript
复制
04:36:28
[6c8450a5-970a-4190-bd2b-829a82d67fdf] INFO : Table redditmovies_bb008c32d0d970f0465f47490123f749 in database video has been updated with new schema

04:36:30
[6c8450a5-970a-4190-bd2b-829a82d67fdf] BENCHMARK : Finished writing to Catalog

04:37:37
[6c8450a5-970a-4190-bd2b-829a82d67fdf] BENCHMARK : Crawler has finished running and is in state READY

上面的日志是否意味着crawler成功完成了?我怎么知道为什么crawler没有触发lambda函数?

以及如何调试此问题?我应该查看哪个日志?

EN

回答 2

Stack Overflow用户

发布于 2019-07-27 10:14:16

以下作品-

代码语言:javascript
复制
Cloudwatch Event Rule -

{
  "source": [
    "aws.glue"
  ],
  "detail-type": [
    "Glue Crawler State Change"
  ],
  "detail": {
    "state": [
      "Succeeded"
    ]
  }
}

示例lambda -

代码语言:javascript
复制
def lambda_handler(event, context):
    try:        
        if event and 'detail' in event and event['detail'] and 'crawlerName' in event['detail']:
            crawler_name = event['detail']['crawlerName']
            print('Received event from crawlerName - {0}'.format(crawler_name))

            crawler = glue.get_crawler(Name=crawler_name)
            print('Received crawler from glue - {0}'.format(str(crawler)))

            database = crawler['Crawler']['DatabaseName']
    except Exception as e:
        print('Error handling events from crawler. Details - {0}'.format(e))
        raise e

这是截图-

票数 1
EN

Stack Overflow用户

发布于 2019-10-22 11:03:20

起初,我使用链接https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/,但它不起作用。我发现这是因为python脚本中的lambda链接是不正确的,如果你直接粘贴它。请检查一下您的lambda。

从link复制的python lambda

代码语言:javascript
复制
import boto3
client = boto3.client('glue')

def lambda_handler(event, context):
response = client.start_job_run(JobName = 'MyTestJob')

我们需要将其修复如下:

代码语言:javascript
复制
import boto3
client = boto3.client('glue')

def lambda_handler(event, context):
  response = client.start_job_run(JobName = 'MyTestJob')
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57213330

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档