文章/答案/技术大牛

发布

社区首页 >问答首页 >无法将pandas数据框保存到具有浮点数列表作为像元值的拼图中

问无法将pandas数据框保存到具有浮点数列表作为像元值的拼图中
EN

Stack Overflow用户

提问于 2021-03-25 22:04:22

回答 1查看 1.6K关注 0票数 1

我有一个数据帧，其结构如下：

                                                Coumn1                                             Coumn2
0    (0.00030271668219938874, 0.0002655923890415579...  (0.0016430083196610212, 0.0014970217598602176,...
1    (0.00015607803652528673, 0.0001314736582571640...  (0.0022136708721518517, 0.0014974646037444472,...
2    (0.011317798867821693, 0.011339936405420303, 0...  (0.004868391435593367, 0.004406007472425699, 0...
3    (3.94578673876822e-05, 3.075833956245333e-05, ...  (0.0075020878575742245, 0.0096737677231431, 0....
4    (0.0004926157998852432, 0.0003811710048466921,...  (0.010351942852139473, 0.008231297135353088, 0...
..                                                 ...                                                ...
130  (0.011190211400389671, 0.011337820440530777, 0...  (0.010182800702750683, 0.011351295746862888, 0...
131  (0.006286659277975559, 0.007315031252801418, 0...  (0.02104150503873825, 0.02531484328210354, 0.0...
132  (0.0022791570518165827, 0.0025983047671616077,...  (0.008847278542816639, 0.009222050197422504, 0...
133  (0.0007059817435219884, 0.0009831463685259223,...  (0.0028264704160392284, 0.0029402063228189945,...
134  (0.0018992726691067219, 0.002058899961411953, ...  (0.0019639385864138603, 0.002009353833273053, ...

[135 rows x 2 columns]

其中每个单元格包含一些浮点值的列表/元组：

type(psd_res.data_frame['Column1'][0])
<class 'tuple'>
type(psd_res.data_frame['Column1'][0][0])
<class 'numpy.float64'>

(每个单元格条目在元组中包含相同数量的条目)

当我现在尝试将dataframe保存为parquet时，我得到了一个错误( save Parquet)：

Can't infer object conversion type: 0    (0.00030271668219938874, 0.0002655923890415579...
1    (0.00015607803652528673, 0.0001314736582571640...
...

Name: Column1, dtype: object

全栈跟踪：https://pastebin.com/8Myu8hNV

我也尝试了另一个引擎pyarrow：

pyarrow.lib.ArrowInvalid: ('Could not convert (0.00030271668219938874, ..., 0.0002464042045176029)
  with type tuple: did not recognize Python value type when inferring an Arrow data type', 
  'Conversion failed for column UO-Pumpe with type object')

所以我找到了这个线程https://github.com/dask/fastparquet/issues/458。这似乎是快速拼接中的一个bug --但在pyarrow中应该可以工作，这对我来说是失败的。

然后我尝试了一些我找到的东西，比如infer_objects()和astype(float) ...到目前为止，一切都不起作用。

谁有办法把我的数据帧保存到拼图上？

parquet

pyarrow

python

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-03-26 02:43:08

数据帧的单元格包含浮点数的元组。这是一种不寻常的数据类型。

所以你需要给arrow一点帮助来弄清楚你的数据类型。为此，您需要显式地提供表的模式。

df = pd.DataFrame(
    {
        "column1": [(1.0, 2.0), (3.0, 4.0, 5.0)]
    }
)
schema = pa.schema([pa.field('column1', pa.list_(pa.float64()))])
df.to_parquet('/tmp/hello.pq', schema=schema)

请注意，如果您使用的是浮点数列表(而不是元组)，那么它将会起作用：

df = pd.DataFrame(
    {
        "column1": [[1.0, 2.0], [3.0, 4.0, 5.0]]
    }
)
df.to_parquet('/tmp/hello.pq')

票数 3

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66801151

复制

相似问题

问无法将pandas数据框保存到具有浮点数列表作为像元值的拼图中
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问无法将pandas数据框保存到具有浮点数列表作为像元值的拼图中EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问无法将pandas数据框保存到具有浮点数列表作为像元值的拼图中
EN