我正在按照这个文档https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/在crawler完成时在lambda上设置一个自动触发器。我在cloudwatch上设置的事件模式是:
{
"detail": {
"crawlerName": [
"reddit_movie"
],
"state": [
"Succeeded"
]
},
"detail-type": [
"Glue Crawler State Change"
],
"source": [
"aws.glue"
]
}我在cloudwatch中添加了一个lambda函数作为此规则的目标。
我手动触发了爬虫,但在爬虫完成后,它不会触发lambda。从爬虫日志中,我可以看到:
04:36:28
[6c8450a5-970a-4190-bd2b-829a82d67fdf] INFO : Table redditmovies_bb008c32d0d970f0465f47490123f749 in database video has been updated with new schema
04:36:30
[6c8450a5-970a-4190-bd2b-829a82d67fdf] BENCHMARK : Finished writing to Catalog
04:37:37
[6c8450a5-970a-4190-bd2b-829a82d67fdf] BENCHMARK : Crawler has finished running and is in state READY上面的日志是否意味着crawler成功完成了?我怎么知道为什么crawler没有触发lambda函数?
以及如何调试此问题?我应该查看哪个日志?
发布于 2019-07-27 10:14:16
以下作品-
Cloudwatch Event Rule -
{
"source": [
"aws.glue"
],
"detail-type": [
"Glue Crawler State Change"
],
"detail": {
"state": [
"Succeeded"
]
}
}示例lambda -
def lambda_handler(event, context):
try:
if event and 'detail' in event and event['detail'] and 'crawlerName' in event['detail']:
crawler_name = event['detail']['crawlerName']
print('Received event from crawlerName - {0}'.format(crawler_name))
crawler = glue.get_crawler(Name=crawler_name)
print('Received crawler from glue - {0}'.format(str(crawler)))
database = crawler['Crawler']['DatabaseName']
except Exception as e:
print('Error handling events from crawler. Details - {0}'.format(e))
raise e这是截图-

发布于 2019-10-22 11:03:20
起初,我使用链接https://aws.amazon.com/premiumsupport/knowledge-center/start-glue-job-run-end/,但它不起作用。我发现这是因为python脚本中的lambda链接是不正确的,如果你直接粘贴它。请检查一下您的lambda。
从link复制的python lambda
import boto3
client = boto3.client('glue')
def lambda_handler(event, context):
response = client.start_job_run(JobName = 'MyTestJob')我们需要将其修复如下:
import boto3
client = boto3.client('glue')
def lambda_handler(event, context):
response = client.start_job_run(JobName = 'MyTestJob')https://stackoverflow.com/questions/57213330
复制相似问题