首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Dask允许用Dask系列索引吗?

Dask允许用Dask系列索引吗?
EN

Stack Overflow用户
提问于 2020-02-21 21:58:50
回答 1查看 135关注 0票数 2

我看到的行为看起来像一个错误在达斯克,但我想确保我没有做错什么事。

我有一个名为labeled_texts的Dask数据框架。它包含一个名为"text“的列。我计算一个名为label_rows的Dask级数,它包含布尔值,其长度与labeled_texts相同。我使用它来索引到labeled_texts中,从这个较小的dataframe中,我得到了如下所示的"text“列。

代码语言:javascript
复制
labeled_text[label_rows]["text"].compute()

当我运行上面的行时,我在Dask/Pandas代码中得到了KeyError: 'text'。但是,下面的命令可以工作

代码语言:javascript
复制
labeled_text[label_rows].compute()["text"]
labeled_text[label_rows.compute()]["text"]

我认为这三个命令都应该产生相同的结果,第一个不应该导致错误。这是正确的吗?

不幸的是,我无法提出一个最低限度的复制方案,我可以在这里张贴。这个问题总是发生在一个特定的集群上,但是运行在不同机器上的相同代码和数据可以正常工作。(这进一步让我认为这是一个Dask bug。)

如果没有一个更好的复制场景,我不希望任何人能够为我解决这个问题。我只想确保我没有做错什么事。

这是完整的堆栈跟踪。

代码语言:javascript
复制
  Traceback (most recent call last):

  ...my code that ultimately calls compute()...

    File "/usr/local/lib/python3.6/site-packages/dask/base.py", line 175, in compute
      (result,) = compute(self, traverse=False, **kwargs)
    File "/usr/local/lib/python3.6/site-packages/dask/base.py", line 446, in compute
      results = schedule(dsk, keys, **kwargs)
    File "/usr/local/lib/python3.6/site-packages/distributed/client.py", line 2510, in get
      results = self.gather(packed, asynchronous=asynchronous, direct=direct)
    File "/usr/local/lib/python3.6/site-packages/distributed/client.py", line 1812, in gather
      asynchronous=asynchronous,
    File "/usr/local/lib/python3.6/site-packages/distributed/client.py", line 753, in sync
      self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
    File "/usr/local/lib/python3.6/site-packages/distributed/utils.py", line 337, in sync
      six.reraise(*error[0])
    File "/usr/local/lib/python3.6/site-packages/six.py", line 693, in reraise
      raise value
    File "/usr/local/lib/python3.6/site-packages/distributed/utils.py", line 322, in f
      result[0] = yield future
    File "/usr/local/lib/python3.6/site-packages/tornado/gen.py", line 1133, in run
      value = future.result()
    File "/usr/local/lib/python3.6/site-packages/distributed/client.py", line 1668, in _gather
      six.reraise(type(exception), exception, traceback)
    File "/usr/local/lib/python3.6/site-packages/six.py", line 692, in reraise
      raise value.with_traceback(tb)
    File "/usr/local/lib/python3.6/site-packages/dask/optimization.py", line 1059, in __call__
      return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
    File "/usr/local/lib/python3.6/site-packages/dask/core.py", line 149, in get
      result = _execute_task(task, cache)
    File "/usr/local/lib/python3.6/site-packages/dask/core.py", line 119, in _execute_task
      return func(*args2)
    File "/usr/local/lib/python3.6/site-packages/pandas/core/frame.py", line 2980, in __getitem__
      indexer = self.columns.get_loc(key)
    File "/usr/local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
      return self._engine.get_loc(self._maybe_cast_indexer(key))
    File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
  KeyError: 'text'
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-02-23 18:05:09

我没有什么特别之处。正如您所建议的,我建议尝试提供一个最小的复制程序。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60346755

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档