我在从dask数组创建dask系列时遇到了问题:
import dask.array as da
import dask.dataframe as dd
_dict = {'doc_faturamento': ['546102424238','946102424238','777702424238'],'data_vencimento':[20190307,20190310,20190311], 'data_pagamento': [20190227,20190324,22220202],'periodo_atraso': [-8,14, 74107], 'periodo_atraso': ['PA/PD', '8-14 días', 'INAD']}
_df = pd.DataFrame( data=_dict)
_df = dd.from_pandas(_df, npartitions=2)
_peri = da.where(_df['data_pagamento']=='2222-02-02','INAD',_df['periodo_atraso'])
_peri_df = dd.from_dask_array(_peri)
_df['periodo_atraso'] = _peri即使用这个例子,我也得到了正确的结果:
_test = da.from_array(np.arange(100000, 190000), chunks=1000)
_test_df = dd.from_dask_array(_test)感谢您的帮助!
发布于 2020-05-24 01:53:52
看起来像是在dask数据帧上调用da.where。我建议使用dd.DataFrame.where或dd.Series.where。
https://stackoverflow.com/questions/61805532
复制相似问题