在AKS中设置一个简单的Strimzi集群时,当代理程序需要删除/清除日志文件时,我遇到了一个问题。此问题仅在使用azurefile存储类时发生,但与其他类一起工作很好。
在具有一个代理和一个动物园管理员副本的集群上进行复制的步骤。删除主题会触发代理结束符中的错误。当日志清理程序试图运行具有“紧凑”策略的主题时,也会出现问题。
Strimzi群集设置
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:
name: test-cluster
namespace: kafka-test
spec:
kafka:
version: 3.0.0
replicas: 1
listeners:
- name: plain
port: 9092
type: internal
tls: false
- name: tls
port: 9093
type: internal
tls: true
- name: external
port: 9094
type: nodeport
tls: false
config:
offsets.topic.replication.factor: 1
transaction.state.log.replication.factor: 1
transaction.state.log.min.isr: 1
default.replication.factor: 1
min.insync.replicas: 1
inter.broker.protocol.version: "3.0"
auto.create.topics.enable: "false"
storage:
type: jbod
volumes:
- id: 0
type: persistent-claim
size: 2Gi
deleteClaim: true
class: azurefile
zookeeper:
replicas: 1
storage:
type: persistent-claim
size: 2Gi
deleteClaim: true
class: azurefile
entityOperator:
topicOperator: {}
userOperator: {}Strimzi主题设置
apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaTopic
metadata:
name: custom-topic
labels:
strimzi.io/cluster: test-cluster
spec:
partitions: 1
replicas: 1初始状态,豆荚正常运行,自定义主题处于就绪状态。

在运行kubectl delete kafkatopic custom-topic之后,broker使用以下错误日志崩溃
2022-07-19 10:00:16,211 ERROR Error while renaming dir for custom-topic-0 in log dir /var/lib/kafka/data-0/kafka-log0 (kafka.server.LogDirFailureChannel) [control-plane-kafka-request-handler-0]
java.nio.file.AccessDeniedException: /var/lib/kafka/data-0/kafka-log0/custom-topic-0 -> /var/lib/kafka/data-0/kafka-log0/custom-topic-0.20aa2754010549d58935ea4144c2f1f6-delete
at java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:90)
at java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
at java.base/sun.nio.fs.UnixCopyFile.move(UnixCopyFile.java:478)
at java.base/sun.nio.fs.UnixFileSystemProvider.move(UnixFileSystemProvider.java:267)
at java.base/java.nio.file.Files.move(Files.java:1422)
at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:932)
at kafka.log.Log.$anonfun$renameDir$2(Log.scala:699)
at kafka.log.Log.renameDir(Log.scala:2487)
at kafka.log.LogManager.asyncDelete(LogManager.scala:1036)
at kafka.log.LogManager.$anonfun$asyncDelete$3(LogManager.scala:1071)
at scala.Option.foreach(Option.scala:437)
at kafka.log.LogManager.$anonfun$asyncDelete$2(LogManager.scala:1069)
at kafka.log.LogManager.$anonfun$asyncDelete$2$adapted(LogManager.scala:1067)
at scala.collection.mutable.HashSet$Node.foreach(HashSet.scala:435)
at scala.collection.mutable.HashSet.foreach(HashSet.scala:361)
at kafka.log.LogManager.asyncDelete(LogManager.scala:1067)
at kafka.server.ReplicaManager.stopPartitions(ReplicaManager.scala:468)
at kafka.server.ReplicaManager.stopReplicas(ReplicaManager.scala:405)
at kafka.server.KafkaApis.handleStopReplicaRequest(KafkaApis.scala:291)
at kafka.server.KafkaApis.handle(KafkaApis.scala:174)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
at java.base/java.lang.Thread.run(Thread.java:829)这个吊舱永远不会恢复,我不得不删除/重新创建集群。这个错误似乎只发生在与file.csi.azure.com提供程序一起使用Kubernetes存储类时。使用kubernetes.io/azure或disk.csi.azure.com没有问题。我尝试创建一个带有一些显式权限集的自定义存储类,但这也不起作用。
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: kafka-azurefile
provisioner: file.csi.azure.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: true
mountOptions:
- dir_mode=0777
- file_mode=0777
- mfsymlinks
- actimeo=30
- uid=0
- gid=0
parameters:
skuName: Standard_LRSStrimzi版本: 0.27.1 Kubernetes版本: 1.22.6
发布于 2022-07-19 21:34:31
Kafka需要块存储,它不适用于文件存储。我想在Azure上,那通常是Azure磁盘存储。(另外,对于Kafka来说,2Gi的存储空间相对较小。不确定在磁盘空间耗尽之前你会走多远。)
https://stackoverflow.com/questions/73035009
复制相似问题