首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >flink missing state value on k8s -在作业管理器/任务管理器崩溃时恢复作业

flink missing state value on k8s -在作业管理器/任务管理器崩溃时恢复作业
EN

Stack Overflow用户
提问于 2020-06-09 16:40:41
回答 1查看 152关注 0票数 0

当flink作业集群(deployment/ pod )在kubernetes上运行时,我们删除了jobmanager和taskmanager(kubectl delete Pod XXX)。我们发现,在pod运行正常后,从PVC挂载rocksDB和检查点文件路径的pod中缺少该状态。在pod运行后,是否有恢复状态的建议?我仔细检查了代码。我发现检查点未启用。是不是作业无法恢复的根本原因?

环境设置如下

代码语言:javascript
复制
RocksDBStateBackend backend = new RocksDBStateBackend(checkPointDataUri + "/checkpoint",true);
        backend.setDbStoragePath(checkPointDataUri + "/RocksDB");
        backend.setNumberOfTransferingThreads(1);

        // add state backend
        env.setStateBackend((StateBackend)backend);

我们可以像下面这样启用检查点吗?

代码语言:javascript
复制
    env.enableCheckpointing(1000);
        env.getCheckpointConfig().setCheckpointingMode(CheckpointingMode.EXACTLY_ONCE);
        env.getCheckpointConfig().setMinPauseBetweenCheckpoints(500);
        env.getCheckpointConfig().setCheckpointTimeout(60000);
        env.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
        env.getCheckpointConfig().enableExternalizedCheckpoints(CheckpointConfig.ExternalizedCheckpointCleanup.RETAIN_ON_CANCELLATION);

下面是重启日志。

代码语言:javascript
复制
2020-06-09 06:48:11,921 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,921 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,962 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,941 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,962 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,941 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,921 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,963 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,942 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,965 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,961 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,965 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,942 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
2020-06-09 06:48:11,981 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Configuring application-defined state backend with job/cluster config
2020-06-09 06:48:11,944 INFO  org.apache.flink.streaming.runtime.tasks.StreamTask           - Using application-defined state backend: RocksDBStateBackend{checkpointStreamBackend=File State Backend (checkpoints: 'file:/opt/flink/data/ss-kpi-ewfj8/checkpoint', savepoints: 'null', asynchronous: UNDEFINED, fileStateThreshold: -1), localRocksDbDirectories=[/opt/flink/data/ss-kpi-ewfj8/RocksDB], enableIncrementalCheckpointing=TRUE, numberOfTransferingThreads=1}
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-06-15 15:59:00

将RocksDB和检查点存储在同一个文件系统中是没有意义的。RocksDB应该使用最快的本地文件系统-- kubernetes临时存储就可以了。并且检查点必须以持久的方式存储在某种分布式文件系统中。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62278422

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档