我正在尝试创建一个新的移动平均(window=3)列,称为数据框中的列'cum_year_WHIP‘的'MA3_WHIP’。我尝试了以下代码来实现它:
read_and_optimized['MA3_WHIP'] = read_and_optimized['cum_year_WHIP'].rolling(3).mean()但由于某种原因,这并没有给我带来我想要的滚动平均。
在创建'cum_year_WHIP‘列之前,我按照'YEAR_ID’和‘Game_date’对df进行了排序:
read_and_optimized.sort_values(['YEAR_ID','Game_Date'], ascending=True,inplace=True)然后我创建了'cum_year_WHIP‘列,这是新的滚动平均列'MA3_WHIP’所基于的列,它是通过在其他三个列((cum_walks_a +cum_hits_a)/cum_innings_pitched)之间使用cumsum()来计算的:
read_and_optimized['cum_year_WHIP'] =(read_and_optimized['cum_year_walks_a'] + read_and_optimized['cum_year_hits_a'])/ read_and_optimized['cum_year_innings_pitched']特别是,我希望'MA3_WHIP‘像'cum_year_WHIP’列一样按'YEAR_ID‘和'Game_Date’列排序,并按'resp_starting_pitcher‘和'YEAR_ID’列分组。
要打印表的外观,我使用以下代码:
df=read_and_optimized[['YEAR_ID','Game_Date','resp_starting_pitcher','cum_year_WHIP','MA3_WHIP']].sort_values(['YEAR_ID','Game_Date'], ascending=True).groupby(['resp_starting_pitcher','YEAR_ID']).apply(print)它给了我一个不想要的输出:
YEAR_ID Game_Date resp_starting_pitcher cum_year_WHIP MA3_WHIP
30677 2012 2012-08-25 abadf001 2.000000 1.438035
19247 2012 2012-08-31 abadf001 2.280009 1.547771
35725 2012 2012-09-05 abadf001 2.270277 1.622140
19257 2012 2012-09-12 abadf001 2.234052 1.736054
42448 2012 2012-09-18 abadf001 1.983877 1.646596
19273 2012 2012-09-24 abadf001 1.880600 1.444433
YEAR_ID Game_Date resp_starting_pitcher cum_year_WHIP MA3_WHIP
6930 2011 2011-05-21 aceva001 1.000000 1.257886
17000 2011 2011-05-26 aceva001 1.090909 1.228938
6936 2011 2011-05-31 aceva001 1.437500 1.554379
6954 2011 2011-06-21 aceva001 1.571429 1.710058相反,我希望得到的是'cum_year_WHIP‘的滚动平均值,它从每个新的'resp_starting投手’和每个新的'YEAR_ID‘开始重新开始。它应该看起来像这样:
YEAR_ID Game_Date resp_starting_pitcher cum_year_WHIP MA3_WHIP
30677 2012 2012-08-25 abadf001 2.000000 Nan
19247 2012 2012-08-31 abadf001 2.280009 Nan
35725 2012 2012-09-05 abadf001 2.270277 2.183428
19257 2012 2012-09-12 abadf001 2.234052 2.261446
42448 2012 2012-09-18 abadf001 1.983877 2.162735
19273 2012 2012-09-24 abadf001 1.880600 2.032843
YEAR_ID Game_Date resp_starting_pitcher cum_year_WHIP MA3_WHIP
6930 2011 2011-05-21 aceva001 1.000000 Nan
17000 2011 2011-05-26 aceva001 1.090909 Nan
6936 2011 2011-05-31 aceva001 1.437500 1.171613
6954 2011 2011-06-21 aceva001 1.571429 1.366612
YEAR_ID Game_Date resp_starting_pitcher cum_year_WHIP MA3_WHIP
7210 2013 2013-04-11 aceva001 1.800000 Nan
13938 2013 2013-04-17 aceva001 1.900000 Nan
7226 2013 2013-04-23 aceva001 2.250006 1.983333
7260 2013 2013-05-27 aceva001 2.068969 2.072991
44210 2013 2013-06-12 aceva001 1.894739 2.071238
7276 2013 2013-06-18 aceva001 1.780222 1.914643当我使用下面的代码时,它可以生成一个表格外观的视图:read_and_optimized.groupby(['resp_starting_pitcher','YEAR_ID'])['cum_year_WHIP'].rolling(3).mean()然而,当我尝试从上面的代码创建一个新的列时,就像在其他类似问题的帖子中建议的那样,它给了我一个错误:
read_and_optimized['MA3_WHIP']= read_and_optimized.groupby(['resp_starting_pitcher','YEAR_ID'])['cum_year_WHIP'].rolling(window=3).mean()
错误是:
TypeError: incompatible index of inserted column with frame index有没有办法在数据框中创建这个新列?
我曾在Why is groupby and rolling not working together?上查看过类似难题的答案:
网址:Pandas - moving averages - use values of previous X entries for current row
但我做不到。
如果能帮助你完成这项工作,我们将非常感激。
发布于 2021-07-08 05:28:00
好了,我终于找到了一个适合我的情况的帖子来帮助我。正如在回答问题时所指出的那样:https://stackoverflow.com/questions/52801540/pandas-groupby-then-rolling-meanI had to
需要做的是重置groupby列的索引,在本例中是'resp_starting_pitcher‘和'YEAR_ID’列,然后在代码中删除它们以创建新的滚动平均列:
read_and_optimized['MA3_WHIP']=read_and_optimized.groupby(['resp_starting_pitcher','YEAR_ID'])['cum_year_WHIP'].rolling(3).mean().reset_index(level = ('resp_starting_pitcher','YEAR_ID'), drop = True)https://stackoverflow.com/questions/68275648
复制相似问题