我目前正在寻找解决查询的最佳方法,我有大约2个Pandas数据格式。这两个数据格式如下:
“第一本”( assignments.
gradebook_df)是一本主要的年级册,其中包含学生的身份证号码和所有学生的成绩,第二本(assignment_2_df)是包含学生第二作业成绩和学生身份证号码的df。gradebook_df
----------------------------------------------------------------------------
| student ID | assignment_1_score | assignment_2_score | final_exam_score |
----------------------------------------------------------------------------
| 1234 | 23 | Nan | Nan |
----------------------------------------------------------------------------
| 0000 | 97 | Nan | Nan |
----------------------------------------------------------------------------
| 0234 | 56 | Nan | Nan |
----------------------------------------------------------------------------assignment_2_df:
------------------------------------
| student ID | assignment_2_score |
------------------------------------
| 1234 | 90 |
------------------------------------
| 0000 | 87 |
------------------------------------
| 0234 | 100 |
------------------------------------我的目标是为每个学生将assignment_2_score从assignment_2_df填充到gradebook_df。
因此,最终的gradebook_df 将如下所示:
----------------------------------------------------------------------------
| student ID | assignment_1_score | assignment_2_score | final_exam_score |
----------------------------------------------------------------------------
| 1234 | 23 | 90 | Nan |
----------------------------------------------------------------------------
| 0000 | 97 | 97 | Nan |
----------------------------------------------------------------------------
| 0234 | 56 | 100 | Nan |
----------------------------------------------------------------------------有人能提供最有效的方法来实现这一点吗?
目前,我正在以下列方式实施:
gradebook_df["assignment_2_score"] = gradebook_df["student ID"].apply(lambda x : getScore(x))
def getScore(studentID):
score_as_list = list(assignment_2_df[assignment_2_df["student ID"] == studentID]["assignment_2_score"])
score = score_as_list[0]
return score这是正确的答案,但我不知道这是否完成这项任务的最有效方法。任何帮助都将不胜感激。我试着在互联网上搜索,但是找不到这些问题的框架。
发布于 2021-03-18 07:41:33
您应该使用pandas.DataFrame.combine_first方法:
import numpy as np
import pandas as pd
gradebook_df = pd.DataFrame({
"student ID": ["1234", "0000", "0234"],
"assignment_1_score": [23, 97, 56],
"assignment_2_score": [np.nan, np.nan, np.nan],
"final_exam_score": [np.nan, np.nan, np.nan],
})
assignment_2_df = pd.DataFrame({
"student ID": ["1234", "0000", "0234"],
"assignment_2_score": [90, 87, 100]
})result = (
gradebook_df.set_index("student ID")
.combine_first(
assignment_2_df.set_index("student ID")
)
)
print(result)
student ID assignment_1_score assignment_2_score final_exam_score
1234 23 90.0 nan
0000 97 87.0 nan
0234 56 100.0 nan发布于 2021-03-18 07:47:32
试试这个:
gradebook_df["assignment_2_score"] = assignment_2_df["assignment_2_score"]https://stackoverflow.com/questions/66686579
复制相似问题