文章/答案/技术大牛

发布

社区首页 >问答首页 >解决涉及2个Pandas数据仓库的查询的最佳方法是什么？

问解决涉及2个Pandas数据仓库的查询的最佳方法是什么？
EN

Stack Overflow用户

提问于 2021-03-18 07:33:33

回答 2查看 32关注 0票数 0

我目前正在寻找解决查询的最佳方法，我有大约2个Pandas数据格式。这两个数据格式如下：

“第一本”( assignments.

the first one，gradebook_df)是一本主要的年级册，其中包含学生的身份证号码和所有学生的成绩，第二本(assignment_2_df)是包含学生第二作业成绩和学生身份证号码的df。

gradebook_df

----------------------------------------------------------------------------
| student ID  | assignment_1_score | assignment_2_score | final_exam_score |
----------------------------------------------------------------------------
| 1234        |  23                | Nan                | Nan              |
----------------------------------------------------------------------------
| 0000        |  97                | Nan                | Nan              |
----------------------------------------------------------------------------
| 0234        |  56                | Nan                | Nan              |
----------------------------------------------------------------------------

assignment_2_df：

------------------------------------
| student ID  | assignment_2_score | 
------------------------------------
| 1234        |  90                | 
------------------------------------
| 0000        |  87                | 
------------------------------------
| 0234        |  100               | 
------------------------------------

我的目标是为每个学生将assignment_2_score从assignment_2_df填充到gradebook_df。

因此，最终的gradebook_df 将如下所示：

----------------------------------------------------------------------------
| student ID  | assignment_1_score | assignment_2_score | final_exam_score |
----------------------------------------------------------------------------
| 1234        |  23                | 90                 | Nan              |
----------------------------------------------------------------------------
| 0000        |  97                | 97                 | Nan              |
----------------------------------------------------------------------------
| 0234        |  56                | 100                | Nan              |
----------------------------------------------------------------------------

有人能提供最有效的方法来实现这一点吗？

目前，我正在以下列方式实施：

gradebook_df["assignment_2_score"] = gradebook_df["student ID"].apply(lambda x : getScore(x))

def getScore(studentID):
    score_as_list = list(assignment_2_df[assignment_2_df["student ID"] == studentID]["assignment_2_score"])
    score = score_as_list[0]
    return score

这是正确的答案，但我不知道这是否完成这项任务的最有效方法。任何帮助都将不胜感激。我试着在互联网上搜索，但是找不到这些问题的框架。

python

pandas

dataframe

join

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-03-18 07:41:33

您应该使用pandas.DataFrame.combine_first方法：

import numpy as np
import pandas as pd


gradebook_df = pd.DataFrame({
    "student ID": ["1234", "0000", "0234"],
    "assignment_1_score": [23, 97, 56],
    "assignment_2_score": [np.nan, np.nan, np.nan],
    "final_exam_score": [np.nan, np.nan, np.nan],
})

assignment_2_df = pd.DataFrame({
    "student ID": ["1234", "0000", "0234"],
    "assignment_2_score": [90, 87, 100]
})

result = (
    gradebook_df.set_index("student ID")
    .combine_first(
        assignment_2_df.set_index("student ID")
    )
)
print(result)

student ID  assignment_1_score  assignment_2_score  final_exam_score
1234        23                  90.0                nan 
0000        97                  87.0                nan 
0234        56                  100.0               nan

票数 1

Stack Overflow用户

发布于 2021-03-18 07:47:32

试试这个：

 gradebook_df["assignment_2_score"] = assignment_2_df["assignment_2_score"]

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66686579

复制

相似问题

问解决涉及2个Pandas数据仓库的查询的最佳方法是什么？
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问解决涉及2个Pandas数据仓库的查询的最佳方法是什么？EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问解决涉及2个Pandas数据仓库的查询的最佳方法是什么？
EN