正常的groupby均值很简单:
df.groupby(['col_a','col_b']).mean()[col_i_want]然而,如果我想应用一个窗口化的均值(默认限制为0.05和0.95),这相当于裁剪数据集,然后执行均值,突然之间似乎没有简单的方法来做到这一点?我将不得不:
winsorized_mean = []
col_i_want = 'col_c'
for entry in df['col_a'].unique():
for entry2 in df['col_b'].unique():
sub_df = df[(df['col_a'] == entry) & (df['col_b'] == entry2)]
m = sub_df[col_to_groupby].clip(lower=0.05,upper=0.95).mean()
winsorized_mean.append([entry,entry2,m])有没有我不知道的函数来自动做这件事?
发布于 2019-12-09 11:37:33
import pandas as pd
from scipy.stats import trim_mean
# label 'a' will exhibit different means depending on trimming
label = ['a'] * 20 + ['b'] * 80 + ['c'] * 400 + ['a'] * 100
data = list(range(100)) + list(range(500, 1000))
df = pd.DataFrame({'label': label, 'data': data})
grouped = df.groupby('label')
# trim 5% off both ends
print(grouped.apply(stats.trim_mean, .05))
# trim 10% off both ends
print(grouped.apply(stats.trim_mean, .1))https://stackoverflow.com/questions/59241970
复制相似问题