我希望为每个组插入一些条目到一个名为'df_recorded‘的数据帧中,然后从另一个名为"df_missed“的数据帧中搜索条目。
import pandas as pd
df_recorded = pd.DataFrame({
'id': ['2008 11', '2008 11', '2008 11', '2008 07', '2008 07', '2008 12', '2008 12', '2008 12'],
'info': ['recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', ],
'score': [98, 68, 79, 75, 66, 62, 60, 60],
'date' : ['2010-12-10', '2010-10-01', '2010-09-12', '2010-12-10', '2010-11-01', '2010-12-07', '2010-11-10', '2010-09-12']
})
df_missed = pd.DataFrame({
'id': ['2008 11', '2008 07', '2008 12'],
'missed_score': [62, 72, 80],
'missed_date': ['2010-08-01', '2010-10-20', '2010-07-23']
})
id info score date
0 2008 11 recorded 98 2010-12-10
1 2008 11 recorded 68 2010-10-01
2 2008 11 recorded 79 2010-09-12
3 2008 07 recorded 75 2010-12-10
4 2008 07 recorded 66 2010-11-01
5 2008 12 recorded 62 2010-12-07
6 2008 12 recorded 60 2010-11-10
7 2008 12 recorded 60 2010-09-12
df_missed
id missed_score missed_date
0 2008 11 62 2010-08-01
1 2008 07 72 2010-10-20
2 2008 12 80 2010-07-23我想在'df_recorded‘中的每个组的末尾添加一行,例如,在'info’列中添加相同的'id=2008 11‘和一个名为'missed’的新条目,然后通过搜索df_missed表添加分数和日期,因此结果应该如下所示:
Target result:
id info score date
0 2008 11 recorded 98 2010-12-10
1 2008 11 recorded 68 2010-10-01
2 2008 11 recorded 79 2010-09-12
3 2008 11 missed 62 2010-08-01 # new record
4 2008 07 recorded 75 2010-12-10
5 2008 07 recorded 66 2010-11-01
6 2008 07 missed 72 2010-10-20 # new record
7 2008 12 recorded 62 2010-12-07
8 2008 12 recorded 60 2010-11-10
9 2008 12 recorded 60 2010-09-12
10 2008 12 missed 80 2010-07-23 # new record我试着用循环编写代码,但速度很慢,效率也很低。因此,如果你有任何想法可以让它变得更好,请帮助。非常感谢。
发布于 2020-06-29 12:22:58
IIUC您可以简单地重命名缺少的df和concat中的列
df_missed.columns = ["id", "score", "date"]
df = pd.concat([df_recorded,df_missed], ignore_index=True, sort=False).sort_values("id", ascending=False)
df.loc[df["info"].isnull(),"info"] = "missing"
print (df)
id info score date
5 2008 12 recorded 62 2010-12-07
6 2008 12 recorded 60 2010-11-10
7 2008 12 recorded 60 2010-09-12
10 2008 12 missing 80 2010-07-23
0 2008 11 recorded 98 2010-12-10
1 2008 11 recorded 68 2010-10-01
2 2008 11 recorded 79 2010-09-12
8 2008 11 missing 62 2010-08-01
3 2008 07 recorded 75 2010-12-10
4 2008 07 recorded 66 2010-11-01
9 2008 07 missing 72 2010-10-20发布于 2020-06-29 12:30:58
尝尝这个
import pandas as pd
df_recorded = pd.DataFrame({
'id': ['2008 11', '2008 11', '2008 11', '2008 07', '2008 07', '2008 12', '2008 12', '2008 12'],
'info': ['recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', ],
'score': [98, 68, 79, 75, 66, 62, 60, 60],
'date' : ['2010-12-10', '2010-10-01', '2010-09-12', '2010-12-10', '2010-11-01', '2010-12-07', '2010-11-10', '2010-09-12']
})
df_missed = pd.DataFrame({
'id': ['2008 11', '2008 07', '2008 12'],
'missed_score': [62, 72, 80],
'missed_date': ['2010-08-01', '2010-10-20', '2010-07-23']
})
df_missed.rename(columns={'missed_score':'score','missed_date':'date'},inplace=True)
df_recorded=df_recorded.append(df_missed,ignore_index=True)
df_recorded=df_recorded.fillna('missed')
df_recorded.sort_values(by='id')输出
id info score date
3 2008 07 recorded 75 2010-12-10
4 2008 07 recorded 66 2010-11-01
9 2008 07 missed 72 2010-10-20
0 2008 11 recorded 98 2010-12-10
1 2008 11 recorded 68 2010-10-01
2 2008 11 recorded 79 2010-09-12
8 2008 11 missed 62 2010-08-01
5 2008 12 recorded 62 2010-12-07
6 2008 12 recorded 60 2010-11-10
7 2008 12 recorded 60 2010-09-12
10 2008 12 missed 80 2010-07-23https://stackoverflow.com/questions/62631051
复制相似问题