我有一些来自谷歌表格的csv表作为考勤报告。数据如下所示
df1= pd.read_csv("12-9-2020.csv")
df1
Name StudentID
Robert C 102
Jessica Myla 103
Nana D 105
df2= pd.read_csv("12-10-2020.csv")
df2
Name StudentID
J Myla 103
Harris Kurt 104
Nana Duncan 105我有很多表想做一份汇编考勤报告。基本汇编考勤报告如下所示:
df_Basic
Name StudentID 12/9/2020 12/10/2020
Robert Case 102 0 0
Jessica Myla 103 0 0
Harris Kurt 104 0 0
Nana Duncan 105 0 0我想将数据表单df1, df2输入到编译考勤报告中。如果学生上课,必须为1,并且学生姓名的拼写将与编译出勤报告格式相匹配。
期望的结果如下所示:
df_Result
Name StudentID 12/9/2020 12/10/2020
Robert Case 102 1 0
Jessica Myla 103 1 1
Harris Kurt 104 0 1
Nana Duncan 105 1 1谢谢你对我的照顾
发布于 2020-12-26 04:49:07
以下是一个完整的解决方案:
df -从桌面上的特定文件夹动态读取所有文件到一个数据文件中(并使用assign创建Date列并从文件名派生日期)。您必须在variabledf_attendance中指定这一点--使用.groupby,通过StudentID返回1或0,并将Date放在df_names上--通过选择每个ID最长的名称作为Name
df_attendance - join,将两个数据格式df_attendance和df_namesimport glob
files = glob.glob(r'Desktop\Students\*.csv')
df = pd.concat([pd.read_csv(f).assign(Date=pd.to_datetime(os.path.basename(f)[:-4]))
for f in files])
df_attendance = (df.groupby(['StudentID', df['Date'].dt.date]).size()
.unstack(1).fillna(0).astype(int))
df_names = (df.groupby(['StudentID'])['Name'].apply(list)
.apply(lambda x: [y for y in x if len(y) ==
max([len(z) for z in x])][0])).to_frame() # this reads.. per student choose the name that is the longest name
df_Result = df_names.join(df_attendance).reset_index()
df_Result
Out[1]:
StudentID Name 2020-12-09 2020-12-10
0 102 Robert C 1 0
1 103 Jessica Myla 1 1
2 104 Harris Kurt 0 1
3 105 Nana Duncan 1 1如果要更改日期列中日期的格式,则可以运行以下代码,非常类似:
files = glob.glob(r'Desktop\Students\*.csv')
df = pd.concat([pd.read_csv(f).assign(Date=pd.to_datetime(os.path.basename(f)[:-4]))
for f in files]).sort_values(['StudentID', 'Date'])
df_attendance = (df.groupby(['StudentID', df['Date'].dt.strftime('%m/%d/%y')], sort=False).size()
.unstack(1).fillna(0).astype(int))
df_names = (df.groupby(['StudentID'])['Name'].apply(list)
.apply(lambda x: [y for y in x if len(y) ==
max([len(z) for z in x])][0])).to_frame()
df_Result = df_names.join(df_attendance).reset_index()
df_Result
Out[2]:
StudentID Name 12/09/20 12/10/20
0 102 Robert C 1 0
1 103 Jessica Myla 1 1
2 104 Harris Kurt 0 1
3 105 Nana Duncan 1 1根据你的评论,你可以这样做。请注意,与直接匹配的名称相比,使用StudentID更好,因此如果使用该名称,只需将"StudentID“替换为" Name”即可。
df_list = pd.read_csv(file) # pass the filepath of you file and must have column "Name:
df_list = df_list[~(df_list['Name'].isin(df_Result['Name']))] #find students not in the file
df_Result = pd.concat([df_Result, df_list[['Name']]]).fillna(0)
df_Result发布于 2020-12-26 04:06:39
你也许可以用
df1= pd.read_csv("12-9-2020.csv")
df1
Name StudentID
Robert C 102
Jessica Myla 103
Nana D 105
df2= pd.read_csv("12-10-2020.csv")
# get ids from second table
ids_of_presenties = df2['StudentID'].values.tolist()
# in first dataframe create a date column
current_date = '12/9/2020'
df1[current_date] = 0
df1[df1['StudentID'].isin(ids_of_presenties)][current_date] = 1 # mark the attendance of the only student present.没有你的数据。希望它是好的
https://stackoverflow.com/questions/65453509
复制相似问题