我有一个如下的数据框架。我知道df.groupby("degree").mean()会为我提供degree专栏的意思。我想采用这些方法,找出每个数据点和那些平均值之间的距离。在这种情况下。对于每个数据点,我希望从means (df.groupby("degree").mean()的输出) (4,40) (2,80)和(4,94)获得3个距离,并创建3个新列。距离应按公式计算,BCA_mean=(name-4)^3+(score-40)^3,M.Tech_mean=(name-2)^3+(score-80)^3,MBA_mean=(name-4)^3+(score-94)^3
import pandas as pd
# dictionary of lists
dict = {'name':[5, 4, 2, 3],
'degree': ["MBA", "BCA", "M.Tech", "MBA"],
'score':[90, 40, 80, 98]}
# creating a dataframe from a dictionary
df = pd.DataFrame(dict)
print (df)
name degree score
0 5 MBA 90
1 4 BCA 40
2 2 M.Tech 80
3 3 MBA 98
df.groupby("degree").mean()
degree name score
BCA 4 40
M.Tech 2 80
MBA 4 94update1
我的真实数据集有100多列。我更喜欢能满足这种需要的东西。逻辑仍然是相同的,对于每个平均值,从列中减去平均值,然后取每个单元格的立方体,然后相加
我找到了类似下面的东西。但是不确定是否有其他有效的方法
y=df.groupby("degree").mean()
print (y)
import numpy as np
(np.square(df[['name','score']].subtract(y.iloc[0,:],axis=1))).sum(axis=1)
df["mean0"]=(np.square(df[['name','score']].subtract(y.iloc[0,:],axis=1))).sum(axis=1)
df发布于 2020-02-20 12:24:14
import pandas as pd
# dictionary of lists
dict = {'degree': ["MBA", "BCA", "M.Tech", "MBA","BCA"],
'name':[5, 4, 2, 3,2],
'score':[90, 40, 80, 98,60],
'game':[100,200,300,100,400],
'money':[100,200,300,100,400],
'loan':[100,200,300,100,400],
'rent':[100,200,300,100,400],
'location':[100,200,300,100,400]}
# creating a dataframe from a dictionary
df = pd.DataFrame(dict)
print (df)
dfx=df.groupby("degree").mean()
print(dfx)
def fun(x):
if x[0]=='BCA':
return x[1:] - dfx.iloc[0,:].tolist()
if x[0]=='M.Tech':
return x[1:]-dfx.iloc[1,:].tolist()
if x[0]=='MBA':
return x[1:]-dfx.iloc[2,:].tolist()
df_added=df.apply(fun,axis=1)
df_added结果
degree name score game money loan rent location
0 MBA 5 90 100 100 100 100 100
1 BCA 4 40 200 200 200 200 200
2 M.Tech 2 80 300 300 300 300 300
3 MBA 3 98 100 100 100 100 100
4 BCA 2 60 400 400 400 400 400平均值是dfx
name score game money loan rent location
degree
BCA 3 50 300 300 300 300 300
M.Tech 2 80 300 300 300 300 300
MBA 4 94 100 100 100 100 100df_added****
每个元素与其平均列值的差值
name score game money loan rent location
0 1 -4 0 0 0 0 0
1 1 -10 -100 -100 -100 -100 -100
2 0 0 0 0 0 0 0
3 -1 4 0 0 0 0 0
4 -1 10 100 100 100 100 100
```https://stackoverflow.com/questions/60312500
复制相似问题