首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >按组将行插入到数据帧中,并且条目来自另一个dataframe_complex匹配

按组将行插入到数据帧中,并且条目来自另一个dataframe_complex匹配
EN

Stack Overflow用户
提问于 2020-06-29 12:06:39
回答 2查看 48关注 0票数 1

我希望为每个组插入一些条目到一个名为'df_recorded‘的数据帧中,然后从另一个名为"df_missed“的数据帧中搜索条目。

代码语言:javascript
复制
import pandas as pd

df_recorded = pd.DataFrame({
    'id': ['2008 11', '2008 11', '2008 11', '2008 07', '2008 07', '2008 12', '2008 12', '2008 12'],
    'info': ['recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', ],
    'score': [98, 68, 79, 75, 66, 62, 60, 60],
    'date' : ['2010-12-10', '2010-10-01', '2010-09-12', '2010-12-10', '2010-11-01', '2010-12-07', '2010-11-10', '2010-09-12']
})

df_missed = pd.DataFrame({
    'id': ['2008 11', '2008 07', '2008 12'],
    'missed_score': [62, 72, 80],
    'missed_date': ['2010-08-01', '2010-10-20', '2010-07-23']
})


        id      info  score        date
0  2008 11  recorded     98  2010-12-10
1  2008 11  recorded     68  2010-10-01
2  2008 11  recorded     79  2010-09-12
3  2008 07  recorded     75  2010-12-10
4  2008 07  recorded     66  2010-11-01
5  2008 12  recorded     62  2010-12-07
6  2008 12  recorded     60  2010-11-10
7  2008 12  recorded     60  2010-09-12

df_missed
        id  missed_score missed_date
0  2008 11            62  2010-08-01
1  2008 07            72  2010-10-20
2  2008 12            80  2010-07-23

我想在'df_recorded‘中的每个组的末尾添加一行,例如,在'info’列中添加相同的'id=2008 11‘和一个名为'missed’的新条目,然后通过搜索df_missed表添加分数和日期,因此结果应该如下所示:

代码语言:javascript
复制
Target result:
         id      info  score        date
0   2008 11  recorded     98  2010-12-10
1   2008 11  recorded     68  2010-10-01
2   2008 11  recorded     79  2010-09-12
3   2008 11    missed     62  2010-08-01 # new record
4   2008 07  recorded     75  2010-12-10
5   2008 07  recorded     66  2010-11-01
6   2008 07    missed     72  2010-10-20 # new record
7   2008 12  recorded     62  2010-12-07
8   2008 12  recorded     60  2010-11-10
9   2008 12  recorded     60  2010-09-12
10  2008 12    missed     80  2010-07-23 # new record

我试着用循环编写代码,但速度很慢,效率也很低。因此,如果你有任何想法可以让它变得更好,请帮助。非常感谢。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-06-29 12:22:58

IIUC您可以简单地重命名缺少的df和concat中的列

代码语言:javascript
复制
df_missed.columns = ["id", "score", "date"]

df = pd.concat([df_recorded,df_missed], ignore_index=True, sort=False).sort_values("id", ascending=False)
df.loc[df["info"].isnull(),"info"] = "missing"
print (df)

         id      info  score        date
5   2008 12  recorded     62  2010-12-07
6   2008 12  recorded     60  2010-11-10
7   2008 12  recorded     60  2010-09-12
10  2008 12   missing     80  2010-07-23
0   2008 11  recorded     98  2010-12-10
1   2008 11  recorded     68  2010-10-01
2   2008 11  recorded     79  2010-09-12
8   2008 11   missing     62  2010-08-01
3   2008 07  recorded     75  2010-12-10
4   2008 07  recorded     66  2010-11-01
9   2008 07   missing     72  2010-10-20
票数 4
EN

Stack Overflow用户

发布于 2020-06-29 12:30:58

尝尝这个

代码语言:javascript
复制
import pandas as pd

df_recorded = pd.DataFrame({
    'id': ['2008 11', '2008 11', '2008 11', '2008 07', '2008 07', '2008 12', '2008 12', '2008 12'],
    'info': ['recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', 'recorded', ],
    'score': [98, 68, 79, 75, 66, 62, 60, 60],
    'date' : ['2010-12-10', '2010-10-01', '2010-09-12', '2010-12-10', '2010-11-01', '2010-12-07', '2010-11-10', '2010-09-12']
})

df_missed = pd.DataFrame({
    'id': ['2008 11', '2008 07', '2008 12'],
    'missed_score': [62, 72, 80],
    'missed_date': ['2010-08-01', '2010-10-20', '2010-07-23']
})
df_missed.rename(columns={'missed_score':'score','missed_date':'date'},inplace=True)

df_recorded=df_recorded.append(df_missed,ignore_index=True)
df_recorded=df_recorded.fillna('missed')
df_recorded.sort_values(by='id')

输出

代码语言:javascript
复制
id  info    score   date
3   2008 07     recorded    75  2010-12-10
4   2008 07     recorded    66  2010-11-01
9   2008 07     missed  72  2010-10-20
0   2008 11     recorded    98  2010-12-10
1   2008 11     recorded    68  2010-10-01
2   2008 11     recorded    79  2010-09-12
8   2008 11     missed  62  2010-08-01
5   2008 12     recorded    62  2010-12-07
6   2008 12     recorded    60  2010-11-10
7   2008 12     recorded    60  2010-09-12
10  2008 12     missed  80  2010-07-23
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62631051

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档