首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >卡夫卡经纪商花了很长时间才能恢复指数,最终被关闭。

卡夫卡经纪商花了很长时间才能恢复指数,最终被关闭。
EN

Stack Overflow用户
提问于 2019-10-10 03:16:13
回答 3查看 2.3K关注 0票数 1

我在Azure K8S上有一个3代理,没有复制卡夫卡的设置,使用卡夫卡5.0.1舵 (它使用5.0.1图像)。

在某个时候(不幸的是,我没有日志),一个Kafka经纪人崩溃了,当它重新启动时,它开始了一个没完没了的痛苦的重新启动循环。它似乎试图恢复某些损坏的日志条目,花了很长时间,然后用SIGTERM挂断。更糟糕的是,我再也不能对受影响的主题进行全面的消费/制作了。下面附加的日志以及监视屏幕截图显示Kafka缓慢地遍历日志文件,填充磁盘缓存。

现在,我的log.retention.bytes设置为180 but但我希望保持这种方式,而不是卡夫卡在这个没完没了的循环。由于怀疑这可能是一个老版本的问题,我在Kafka (“仍在启动”"SIGTERM“”损坏的索引文件“)中搜索了相关的关键字,但一无所获。

因此,我不能依靠较新的版本来解决这个问题,我也不想依赖于一个小的保留大小,因为这可能会弹出大量的损坏日志。

因此,我的问题是-是否有办法做任何/全部以下工作:

  • 阻止SIGTERM的发生,从而让卡夫卡完全康复?
  • 允许在不受影响的分区上恢复消费/生产(看起来30个分区中只有4个有损坏的条目)?
  • 否则就能阻止这种疯狂的发生?

(如果不是,我将求助于:(a)升级卡夫卡;(b)将log.retention.bytes缩小一个数量级;(c)打开复制品,希望这会有所帮助;(d)改进伐木,找出造成坠机的最初原因。

日志

记录加载日志的位置,但清除+刷新被中断:

代码语言:javascript
复制
[2019-10-10 00:05:36,562 INFO [ThrottledChannelReaper-Fetch: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-10-10 00:05:36,564 INFO [ThrottledChannelReaper-Produce: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-10-10 00:05:36,564 INFO [ThrottledChannelReaper-Request: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-10-10 00:05:36,598 INFO Loading logs. (kafka.log.LogManager)
[2019-10-10 00:05:37,802 WARN [Log partition=my-topic-3, dir=/opt/kafka/data-0/logs] Found a corrupted index file corresponding to log file /opt/kafka/data-0/logs/my-topic-3/00000000000000031038.log due to Corrupt time index found, time index file (/opt/kafka/data-0/logs/my-topic-3/00000000000000031038.timeindex) has non-zero size but the last timestamp is 0 which is less than the first timestamp 1570449760949}, recovering segment and rebuilding index files... (kafka.log.Log)
...
[2019-10-10 00:42:27,037] INFO Logs loading complete in 2210438 ms. (kafka.log.LogManager)
[2019-10-10 00:42:27,052] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2019-10-10 00:42:27,054] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
[2019-10-10 00:42:27,057] INFO Starting the log cleaner (kafka.log.LogCleaner)
[2019-10-10 00:42:27,738] INFO Terminating process due to signal SIGTERM (org.apache.kafka.common.utils.LoggingSignalHandler)
[2019-10-10 00:42:27,763] INFO Shutting down SupportedServerStartable (io.confluent.support.metrics.SupportedServerStartable)  

记录加载中断时的日志:

代码语言:javascript
复制
[2019-10-10 01:55:25,502 INFO [ThrottledChannelReaper-Fetch: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-10-10 01:55:25,502 INFO [ThrottledChannelReaper-Produce: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-10-10 01:55:25,504 INFO [ThrottledChannelReaper-Request: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-10-10 01:55:25,549 INFO Loading logs. (kafka.log.LogManager)
[2019-10-10 01:55:27,123 WARN [Log partition=my-topic-3, dir=/opt/kafka/data-0/logs] Found a corrupted index file corresponding to log file /opt/kafka/data-0/logs/my-topic-3/00000000000000031038.log due to Corrupt time index found, time index file (/opt/kafka/data-0/logs/my-topic-3/00000000000000031038.timeindex) has non-zero size but the last timestamp is 0 which is less than the first timestamp 1570449760949}, recovering segment and rebuilding index files... (kafka.log.Log)
...
[2019-10-10 02:17:01,249] INFO [ProducerStateManager partition=my-topic-12] Loading producer state from snapshot file '/opt/kafka/data-0/logs/my-topic-12/00000000000000004443.snapshot' (kafka.log.ProducerStateManager)
[2019-10-10 02:17:07,090] INFO Terminating process due to signal SIGTERM (org.apache.kafka.common.utils.LoggingSignalHandler)
[2019-10-10 02:17:07,093] INFO Shutting down SupportedServerStartable (io.confluent.support.metrics.SupportedServerStartable)
[2019-10-10 02:17:07,093] INFO Closing BaseMetricsReporter (io.confluent.support.metrics.BaseMetricsReporter)
[2019-10-10 02:17:07,093] INFO Waiting for metrics thread to exit (io.confluent.support.metrics.SupportedServerStartable)
[2019-10-10 02:17:07,093] INFO Shutting down KafkaServer (io.confluent.support.metrics.SupportedServerStartable)
[2019-10-10 02:17:07,097] INFO [KafkaServer id=2] shutting down (kafka.server.KafkaServer)
[2019-10-10 02:17:07,105] ERROR [KafkaServer id=2] Fatal error during KafkaServer shutdown. (kafka.server.KafkaServer)
java.lang.IllegalStateException: Kafka server is still starting up, cannot shut down!
    at kafka.server.KafkaServer.shutdown(KafkaServer.scala:560)
    at io.confluent.support.metrics.SupportedServerStartable.shutdown(SupportedServerStartable.java:147)
    at io.confluent.support.metrics.SupportedKafka$1.run(SupportedKafka.java:62)
[2019-10-10 02:17:07,110] ERROR Caught exception when trying to shut down KafkaServer. Exiting forcefully. (io.confluent.support.metrics.SupportedServerStartable)
java.lang.IllegalStateException: Kafka server is still starting up, cannot shut down!
    at kafka.server.KafkaServer.shutdown(KafkaServer.scala:560)
    at io.confluent.support.metrics.SupportedServerStartable.shutdown(SupportedServerStartable.java:147)
    at io.confluent.support.metrics.SupportedKafka$1.run(SupportedKafka.java:62)

监控

EN

回答 3

Stack Overflow用户

发布于 2019-12-29 07:58:56

我在寻找解决类似问题的方法时发现了你的问题。

我想知道你是否解决了这个问题??

同时,谁在打电话给SIGTERM?可能是Kubernetes或其他编排器,您可以修改就绪探测器,以便在杀死容器之前进行更多的尝试。

还要确保您的xmx配置少于分配给pod/容器的资源。否则库伯内特斯就会杀死这个吊舱(如果库伯奈特是这样的话)

票数 1
EN

Stack Overflow用户

发布于 2020-07-28 08:02:44

我也遇到了同样的问题,我通过在kafka (server.properties文件)中增加两个值来解决这个问题:

zookeeper.connection.timeout.ms

zookeeper.session.timeout.ms

我把他们俩的上限都提高到了18000。对两者都有相同的值似乎是无用的(至少根据https://kafka.apache.org/documentation/#zookeeper.connection.timeout.ms的说法)。但无论如何,它为我解决了这个问题。

票数 1
EN

Stack Overflow用户

发布于 2022-10-25 11:54:37

使用印度卡夫卡图时,我也面临着类似的问题。

代码语言:javascript
复制
[2022-10-25 11:07:49,596] INFO Terminating process due to signal SIGTERM (org.apache.kafka.common.utils.LoggingSignalHandler)
[2022-10-25 11:07:49,605] INFO [KafkaServer id=0] shutting down (kafka.server.KafkaServer)
[2022-10-25 11:07:49,609] ERROR [KafkaServer id=0] Fatal error during KafkaServer shutdown. (kafka.server.KafkaServer)
java.lang.IllegalStateException: Kafka server is still starting up, cannot shut down!
    at kafka.server.KafkaServer.shutdown(KafkaServer.scala:705)
    at kafka.Kafka$.$anonfun$main$3(Kafka.scala:100)
    at kafka.utils.Exit$.$anonfun$addShutdownHook$1(Exit.scala:38)
    at java.base/java.lang.Thread.run(Thread.java:829)
[2022-10-25 11:07:49,611] ERROR Halting Kafka. (kafka.Kafka$)

增加了livenessProbe.initialDelaySeconds,它起作用了。由于在Kafka中加载了现有的主题快照,LivenessProbe失败了。

但是我不明白为什么会发生信号问题!

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58314946

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档