首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >spark streaming中批量时间与提交时间相差50分钟

spark streaming中批量时间与提交时间相差50分钟
EN

Stack Overflow用户
提问于 2019-07-19 16:54:14
回答 1查看 150关注 0票数 0

spark版本是2.2.0伪代码:

从卡夫卡阅读data1,时间为5分钟

从kafka阅读data2,10分钟窗口和5分钟幻灯片持续时间

在某些条件下data1连接data2

执行一些agg并写入mysql

问:批量时间为15:00,提交时间为15:50,处理时间小于1分钟。发生了什么?

代码语言:javascript
复制
val shareDs = KafkaUtils.createDirectStream[String, String](streamContext, LocationStrategies.PreferBrokers, shareReqConsumer)

val shareResDS = KafkaUtils.createDirectStream[String, String](streamContext, LocationStrategies.PreferBrokers, shareResConsumer).window(Minutes(WindowTime), Minutes(StreamTime))

shareDs doSomeMap join (shareResDs doSomeMap) forEachRddd{do some things then write to mysql}

这里有一些日志:

代码语言:javascript
复制
19/07/22 11:20:00 INFO dstream.MappedDStream: Slicing from 1563765000000 ms to 1563765600000 ms (aligned to 1563765000000 ms and 1563765600000 ms)
19/07/22 11:20:00 INFO dstream.MappedDStream: Slicing from 1563765000000 ms to 1563765600000 ms (aligned to 1563765000000 ms and 1563765600000 ms)
19/07/22 11:20:00 INFO dstream.MappedDStream: Slicing from 1563765000000 ms to 1563765600000 ms (aligned to 1563765000000 ms and 1563765600000 ms)
19/07/22 11:20:00 INFO internals.ConsumerCoordinator: [Consumer clientId=consumer-6, groupId=dashboard] Revoking previously assigned partitions [topic_wh_sparkstream_afp_com_input_result-2, topic_wh_sparkstream_afp_com_input_result-1, topic_wh_sparkstream_afp_com_input_result-0]
19/07/22 11:20:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-6, groupId=dashboard] (Re-)joining group
19/07/22 11:25:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-6, groupId=dashboard] Successfully joined group with generation 820
19/07/22 11:25:00 INFO internals.ConsumerCoordinator: [Consumer clientId=consumer-6, groupId=dashboard] Setting newly assigned partitions [topic_wh_sparkstream_afp_com_input_result-2, topic_wh_sparkstream_afp_com_input_result-1, topic_wh_sparkstream_afp_com_input_result-0]
19/07/22 11:25:00 INFO dstream.MappedDStream: Slicing from 1563765000000 ms to 1563765600000 ms (aligned to 1563765000000 ms and 1563765600000 ms)
19/07/22 11:25:00 INFO dstream.MappedDStream: Slicing from 1563765000000 ms to 1563765600000 ms (aligned to 1563765000000 ms and 1563765600000 ms)
19/07/22 11:25:00 INFO internals.ConsumerCoordinator: [Consumer clientId=consumer-5, groupId=dashboard] Revoking previously assigned partitions [topic_wh_sparkstream_decision_report_result-1, topic_wh_sparkstream_decision_report_result-2, topic_wh_sparkstream_decision_report_result-0]
19/07/22 11:25:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-5, groupId=dashboard] (Re-)joining group
19/07/22 11:30:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-5, groupId=dashboard] Successfully joined group with generation 821
19/07/22 11:30:00 INFO internals.ConsumerCoordinator: [Consumer clientId=consumer-5, groupId=dashboard] Setting newly assigned partitions [topic_wh_sparkstream_decision_report_result-1, topic_wh_sparkstream_decision_report_result-2, topic_wh_sparkstream_decision_report_result-0]
19/07/22 11:30:00 INFO internals.ConsumerCoordinator: [Consumer clientId=consumer-4, groupId=dashboard] Revoking previously assigned partitions [topic_wh_sparkstream_echo_mixed_risk_record-1, topic_wh_sparkstream_echo_mixed_risk_record-2, topic_wh_sparkstream_echo_mixed_risk_record-0]
19/07/22 11:30:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-4, groupId=dashboard] (Re-)joining group
19/07/22 11:30:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-4, groupId=dashboard] Marking the coordinator 10.124.35.112:9092 (id: 2147483534 rack: null) dead
19/07/22 11:30:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-4, groupId=dashboard] Discovered group coordinator 10.124.35.112:9092 (id: 2147483534 rack: null)
19/07/22 11:30:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-4, groupId=dashboard] (Re-)joining group
19/07/22 11:35:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-4, groupId=dashboard] Successfully joined group with generation 822
19/07/22 11:35:00 INFO internals.ConsumerCoordinator: [Consumer clientId=consumer-4, groupId=dashboard] Setting newly assigned partitions [topic_wh_sparkstream_echo_mixed_risk_record-1, topic_wh_sparkstream_echo_mixed_risk_record-2, topic_wh_sparkstream_echo_mixed_risk_record-0]
19/07/22 11:35:00 INFO dstream.MappedDStream: Slicing from 1563765000000 ms to 1563765600000 ms (aligned to 1563765000000 ms and 1563765600000 ms)
19/07/22 11:35:00 INFO internals.ConsumerCoordinator: [Consumer clientId=consumer-3, groupId=dashboard] Revoking previously assigned partitions [topic_wh_sparkstream_echo_mixed_risk_result_detail-2, topic_wh_sparkstream_echo_mixed_risk_result_detail-1, topic_wh_sparkstream_echo_mixed_risk_result_detail-0, topic_wh_sparkstream_echo_behavior_features_result-0, topic_wh_sparkstream_echo_behavior_features_result-1, topic_wh_sparkstream_echo_behavior_features_result-2]
19/07/22 11:35:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-3, groupId=dashboard] (Re-)joining group
19/07/22 11:35:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-3, groupId=dashboard] Marking the coordinator 10.124.35.112:9092 (id: 2147483534 rack: null) dead
19/07/22 11:35:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-3, groupId=dashboard] Discovered group coordinator 10.124.35.112:9092 (id: 2147483534 rack: null)
19/07/22 11:35:00 INFO internals.AbstractCoordinator: [Consumer clientId=consumer-3, groupId=dashboard] (Re-)joining group

在窗口时间戳,只做kafka重新分区,而不是添加作业。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-08-30 11:07:55

我解决了这个问题。将火花流与kafka一起使用,使用单独的group_id对每个流进行混编,并禁用自动提交,适当配置kafka参数。特别是心跳、会话超时、请求超时、最大轮询间隔。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57108896

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档