我有以下dask_cudf.core.DataFrame:
import pandas as pd
import numpy as np
import dask_cudf
import cudf
data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), "unif":np.random.uniform(size = 20)}
df = cudf.DataFrame(data)
ddf = dask_cudf.from_cudf(df, npartitions = 2)
ddf.compute()我想为列nor和unif创建第1到第5个滞后值。然而,我创建它们的方式如下:
colz = ["nor", "unif"]
ddf[[s + "_" + str(1) for s in colz]] = ddf[colz].shift(1)
ddf[[s + "_" + str(2) for s in colz]] = ddf[colz].shift(2)我可以创建第一个和第二个滞后值,但仅此而已。当我运行值大于2的shift时,会得到以下错误:
/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
175 try:
--> 176 yield
177 except Exception as e:
16 frames
cudf/_lib/copying.pyx in cudf._lib.copying.shift()
RuntimeError: parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py in raise_on_meta_error(funcname, udf)
195 )
196 msg = msg.format(f" in `{funcname}`" if funcname else "", repr(e), tb)
--> 197 raise ValueError(msg) from e
198
199
ValueError: Metadata inference failed in `shift`.
Original error is below:
------------------------
RuntimeError('parallel_for failed: cudaErrorInvalidConfiguration: invalid configuration argument')
Traceback:
---------
File "/usr/local/lib/python3.7/site-packages/dask/dataframe/utils.py", line 176, in raise_on_meta_error
yield
File "/usr/local/lib/python3.7/site-packages/dask/dataframe/core.py", line 5833, in _emulate
return func(*_extract_meta(args, True), **_extract_meta(kwargs, True))
File "/usr/local/lib/python3.7/site-packages/dask/utils.py", line 1021, in __call__
return getattr(__obj, self.method)(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1788, in shift
return self._shift(periods)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1793, in _shift
zip(self._column_names, data_columns), self._index
File "/usr/local/lib/python3.7/site-packages/cudf/core/dataframe.py", line 818, in _from_data
out = super()._from_data(data, index)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 140, in _from_data
Frame.__init__(obj, data, index)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 78, in __init__
self._data = cudf.core.column_accessor.ColumnAccessor(data)
File "/usr/local/lib/python3.7/site-packages/cudf/core/column_accessor.py", line 121, in __init__
data = dict(data)
File "/usr/local/lib/python3.7/site-packages/cudf/core/frame.py", line 1791, in <genexpr>
data_columns = (col.shift(offset, fill_value) for col in self._columns)
File "/usr/local/lib/python3.7/site-packages/cudf/core/column/column.py", line 391, in shift
return libcudf.copying.shift(self, offset, fill_value)
File "cudf/_lib/copying.pyx", line 633, in cudf._lib.copying.shift我似乎不明白为什么会发生这种事。
发布于 2022-07-02 00:06:17
谢谢您的最小复制;只要做一点小小的改动就可以了。不要.compute()达斯克太早。如果您需要在dask/dask_cudf中执行某些操作并继续处理,请使用.persist()
import pandas as pd
import numpy as np
import dask_cudf
import cudf
data = {"x":range(1,21), "nor":np.random.normal(2, 4, 20), "unif":np.random.uniform(size = 20)}
df = cudf.DataFrame(data)
ddf = dask_cudf.from_cudf(df, npartitions = 2)
colz = ["nor", "unif"]
ddf[[s + "_" + str(1) for s in colz]] = ddf[colz].shift(1)
ddf[[s + "_" + str(2) for s in colz]] = ddf[colz].shift(2)
ddf[[s + "_" + str(3) for s in colz]] = ddf[colz].shift(3)
ddf[[s + "_" + str(5) for s in colz]] = ddf[colz].shift(5)
ddf.compute()输出
x nor unif nor_1 unif_1 nor_2 unif_2 nor_3 unif_3 nor_5 unif_5
0 1 3.711132 0.021615 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
1 2 -2.465054 0.081927 3.711131915 0.021614727 <NA> <NA> <NA> <NA> <NA> <NA>
2 3 1.543548 0.481731 -2.465054359 0.081927168 3.711131915 0.021614727 <NA> <NA> <NA> <NA>
3 4 8.820771 0.040135 1.543548323 0.481731194 -2.465054359 0.081927168 3.711131915 0.021614727 <NA> <NA>
4 5 0.233656 0.135811 8.82077073 0.040135259 1.543548323 0.481731194 -2.465054359 0.081927168 <NA> <NA>
5 6 2.526556 0.360873 0.23365638 0.135810979 8.82077073 0.040135259 1.543548323 0.481731194 3.711131915 0.021614727
6 7 2.799205 0.383579 2.526555817 0.360873336 0.23365638 0.135810979 8.82077073 0.040135259 -2.465054359 0.081927168
7 8 5.960305 0.362417 2.799205226 0.383579063 2.526555817 0.360873336 0.23365638 0.135810979 1.543548323 0.481731194
8 9 1.878898 0.609364 5.960304782 0.362416925 2.799205226 0.383579063 2.526555817 0.360873336 8.82077073 0.040135259
9 10 1.217635 0.041408 1.878898482 0.609364119 5.960304782 0.362416925 2.799205226 0.383579063 0.23365638 0.135810979
10 11 0.580250 0.128405 1.21763458 0.04140812 1.878898482 0.609364119 5.960304782 0.362416925 2.526555817 0.360873336
11 12 4.907322 0.708164 0.580249571 0.128405085 1.21763458 0.04140812 1.878898482 0.609364119 2.799205226 0.383579063
12 13 6.591673 0.105310 4.907321929 0.708164063 0.580249571 0.128405085 1.21763458 0.04140812 5.960304782 0.362416925
13 14 -2.974896 0.587859 6.591673409 0.105310053 4.907321929 0.708164063 0.580249571 0.128405085 1.878898482 0.609364119
14 15 2.284847 0.978458 -2.974896021 0.587858754 6.591673409 0.105310053 4.907321929 0.708164063 1.21763458 0.04140812
15 16 -5.616458 0.114736 2.28484689 0.97845785 -2.974896021 0.587858754 6.591673409 0.105310053 0.580249571 0.128405085
16 17 -3.003533 0.279865 -5.616457873 0.114736009 2.28484689 0.97845785 -2.974896021 0.587858754 4.907321929 0.708164063
17 18 0.241106 0.923462 -3.003532592 0.279864688 -5.616457873 0.114736009 2.28484689 0.97845785 6.591673409 0.105310053
18 19 -2.100202 0.613850 0.241106056 0.923462497 -3.003532592 0.279864688 -5.616457873 0.114736009 -2.974896021 0.587858754
19 20 8.364832 0.929587 -2.100201941 0.613850209 0.241106056 0.923462497 -3.003532592 0.279864688 2.28484689 0.97845785
https://stackoverflow.com/questions/72499158
复制相似问题