文章/答案/技术大牛

发布

社区首页 >问答首页 >PySpark无法在Koalas DataFrame中计算列标准差

问PySpark无法在Koalas DataFrame中计算列标准差
EN

Stack Overflow用户

提问于 2019-11-08 06:42:32

回答 1查看 93关注 0票数 1

我在PySpark有一只考拉DataFrame。我想要计算逐列的标准差。我试过这样做：

df2['x_std'] = df2[['x_1',
'x_2',
'x_3',
'x_4',
'x_5',
'x_6',
'x_7',
'x_8',
'x_9',
'x_10','x_11',
'x_12']].std(axis = 1)

我得到以下错误：

TypeError: 'DataFrame' object does not support item assignment

我还做了一些类似的事情：

d1 = df2[['x_1',
'x_2',
'x_3',
'x_4',
'x_5',
'x_6',
'x_7',
'x_8',
'x_9',
'x_10','x_11',
'x_12']].std(axis = 1) 

df2['x_std'] = d1 # d1 is a Koalas Series that should get assigned to the new column.

在执行此操作时，我收到以下错误：

Cannot combine column argument because it comes from a different dataframe

对考拉来说是全新的。有谁能给点建议吗？谢谢。

python

pandas

pyspark

spark-koalas

回答 1

Stack Overflow用户

发布于 2020-02-15 04:10:25

您可以将选项"compute.ops_on_diff_frames"设置为True，然后执行该操作。

import databricks.koalas as ks

ks.set_option("compute.ops_on_diff_frames", True)

kdf = ks.DataFrame(
    {'a': [1, 2, 3, 4, 5, 6],
     'b': [2, 1, 7, 4, 2, 3],
     'c': [3, 7, 1, 4, 6, 5],
     'd': [4, 2, 3, 4, 3, 8],},)

kdf['dev'] = kdf[['a', 'b', 'c', 'd']].std(axis=1)
print (kdf)

   a  b  c  d       dev
0  1  2  3  4  1.241909
5  6  3  5  8  2.363684
1  2  1  7  2  2.348840
3  4  4  4  4  1.788854
2  3  7  1  3  2.223378
4  5  2  6  3  1.856200

不确定它是否为good practice，因为默认情况下是不允许的。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58757923

复制

相似问题

问PySpark无法在Koalas DataFrame中计算列标准差
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PySpark无法在Koalas DataFrame中计算列标准差EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问PySpark无法在Koalas DataFrame中计算列标准差
EN