在下面的插入中,我有一个名为df_out的df,它的列名如下,但是由于某种原因,我不能对列标题使用'groupby‘函数,因为它总是给我KeyError:'year’。我研究并尝试过剥离空格,重置索引,在我的groupby设置前允许空格,等等,我无法通过这个KeyError。df_out看起来是这样的:
df_out.columns
Out[185]:
Index(['year', 'month', 'BARTON CHAPEL', 'BARTON I', 'BIG HORN I',
'BLUE CREEK', 'BUFFALO RIDGE I', 'CAYUGA RIDGE', 'COLORADO GREEN',
'DESERT WIND', 'DRY LAKE I', 'EL CABO', 'GROTON', 'NEW HARVEST',
'PENASCAL I', 'RUGBY', 'TULE'],
dtype='object', name='plant_name')但是,当我使用df_out.head()时,我得到了与'plant_name‘前导列不同的答案,所以这可能是错误来自或相关的地方。以下是来自以下内容的输出列:
df_out.head()
Out[187]:
plant_name year month BARTON CHAPEL BARTON I BIG HORN I BLUE CREEK \
0 1991 1 6.432285 7.324126 5.170067 6.736384
1 1991 2 7.121324 6.973586 4.922693 7.473527
2 1991 3 8.125793 8.681317 5.796599 8.401855
3 1991 4 7.454972 8.037764 7.272292 7.961625
4 1991 5 7.012809 6.530013 6.626949 6.009825
plant_name BUFFALO RIDGE I CAYUGA RIDGE COLORADO GREEN DESERT WIND \
0 7.163790 7.145323 5.783629 5.682003
1 7.595744 7.724717 6.245952 6.269524
2 8.111411 9.626075 7.918871 6.657648
3 8.807458 8.618806 7.011444 5.848736
4 7.734852 6.267097 7.410013 5.099610
plant_name DRY LAKE I EL CABO GROTON NEW HARVEST PENASCAL I \
0 4.721089 10.747285 7.456640 6.921801 6.296425
1 5.095923 8.891057 7.239762 7.449122 6.484241
2 8.409637 12.238508 8.274046 8.824758 8.444960
3 7.893694 10.837139 6.381736 8.840431 7.282444
4 8.496976 8.636882 6.856747 7.469825 7.999530
plant_name RUGBY TULE
0 7.028360 4.110605
1 6.394687 5.257128
2 6.859462 10.789516
3 7.590153 7.425153
4 7.556546 8.085255 我的获取KeyError的groupby语句如下所示,我正在尝试根据列表中找到的df_out列的子集‘west’来计算按年和月的行的平均值:
west=['BIG HORN I','DRY LAKE I', 'TULE']
westavg = df_out[df_out.columns[df_out.columns.isin(west)]].groupby(['year','month']).mean()非常感谢,
发布于 2021-09-21 15:24:04
您的代码可以分解为:
westavg = (df_out[df_out.columns[df_out.columns.isin(west)]]
.groupby(['year','month']).mean()
)这不起作用,因为['year','month']不是df_out[df_out.columns[df_out.columns.isin(west)]]的列。
尝试:
west_cols = [c for c in df_out if c in west]
westavg = df_out.groupby(['year','month'])[west_cols].mean()发布于 2021-09-21 15:52:18
好的,在下面Quang Hoang的帮助下,我理解了这个问题,并得出了这个答案,使用.intersection我能够更好地理解它:
westavg = df_out[df_out.columns.intersection(west)].mean(axis=1)#给出由列表'west'`定义的列子集中每行的平均值。
https://stackoverflow.com/questions/69271543
复制相似问题