首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Google Cloud Dataflow作业神秘中断

Google Cloud Dataflow作业神秘中断
EN

Stack Overflow用户
提问于 2019-02-27 06:00:57
回答 1查看 283关注 0票数 1

我反复尝试运行一组google云数据流作业,这些作业直到最近才正常工作,现在往往会崩溃。这个错误是最让人费解的,因为我不知道引用了什么代码,而且它似乎是GCP的内部代码?

我的工作ID是: 2019-02-26_13_27_30-16974532604317793751

我在n1-standard-96实例上运行这些作业。

作为参考,完整的跟踪:

代码语言:javascript
复制
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py", line 642, in do_work
    work_executor.execute()
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py", line 156, in execute
    op.start()
  File "dataflow_worker/shuffle_operations.py", line 49, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    def start(self):
  File "dataflow_worker/shuffle_operations.py", line 50, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    with self.scoped_start_state:
  File "dataflow_worker/shuffle_operations.py", line 65, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    with self.scoped_process_state:
  File "dataflow_worker/shuffle_operations.py", line 66, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    with self.shuffle_source.reader() as reader:
  File "dataflow_worker/shuffle_operations.py", line 68, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
    for key_values in reader:
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 433, in __iter__
    for entry in entries_iterator:
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 272, in next
    return next(self.iterator)
  File "/usr/local/lib/python2.7/dist-packages/dataflow_worker/shuffle.py", line 230, in __iter__
    chunk, next_position = self.reader.Read(start_position, end_position)
  File "third_party/windmill/shuffle/python/shuffle_client.pyx", line 133, in shuffle_client.PyShuffleReader.Read
IOError: Shuffle read failed: DATA_LOSS: Missing last fragment of a large value.
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-03-05 08:56:39

也许现在的输入数据更大,而DataFlow无法处理它?

我的工作就是有洗牌的问题。当我切换到可选的"shuffle服务“时,它开始工作了。你可能想试一试。只需在作业命令中添加以下内容:

代码语言:javascript
复制
--experiments shuffle_mode=service

参考:请参阅this page的“使用云数据流混洗”部分。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/54894842

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档