我有两个日期框架,如下所示:
import pandas as pd
df1 = pd.DataFrame({'serialNo':['aaaa','bbbb','cccc','ffff','aaaa','bbbb','aaaa'],
'Name':['Sayonti','Ruchi','Tony','Gowtam','Toffee','Tom','Sayonti'],
'testName': [4402, 3747 ,5555,8754,1234,9876,3602],
'moduleName': ['singing', 'dance','booze', 'vocals','drama','paint','singing'],
'endResult': ['WARNING', 'FAILED', 'WARNING', 'FAILED','WARNING','FAILED','WARNING'],
'Date':['2018-10-5','2018-10-6','2018-10-7','2018-10-8','2018-10-9','2018-10-10','2018-10-8'],
'Time_df1':['23:26:39','22:50:31','22:15:28','21:40:19','21:04:15','20:29:11','19:54:03']})
df2 = pd.DataFrame({'serialNo':['aaaa','bbbb','aaaa','ffff','xyzy','aaaa'],
'Food':['Strawberry','Coke','Pepsi','Nuts','Apple','Candy'],
'Work': ['AP', 'TC','OD', 'PU','NO','PM'],
'Date':['2018-10-4','2018-10-6','2018-10-5','2018-10-7','2018-10-5','2018-10-10'],
'Time_df2':['09:00:00','10:00:00','11:00:00','12:00:00','13:00:00','14:00:00']
})现在我已经合并了两个帧,如下所示:
df1['Date'] = pd.to_datetime(df1['Date'])
df2['Date'] = pd.to_datetime(df2['Date'])
result = pd.merge(df1,df2,on=['serialNo'],how='inner')我想按以下方式分组
result = result[result.Date_x.sub(result.Date_y).dt.days.between(0,3)]
result.drop(['Date_x','Date_y','Time_df1','Time_df2'],axis=1,inplace=True)
result = result.groupby(['serialNo'])['Food'].apply(','.join).reset_index()但我希望输出结果是这样的:
output = pd.DataFrame({'serialNo':['aaaa','bbbb','ffff'],
'Name':['Sayonti,Sayonti,Sayonti','Ruchi','Gowtam'],
'testName': ['4402,4402,3602','3747','8754'],
'moduleName': ['singing,singing,singing', 'dance','vocals'],
'endResult': ['WARNING,WARNING,WARNING','FAILED','FAILED'],
'Food':['Strawberry,Pepsi,Pepsi','Coke','Nuts'],
'Work':['AP,OD,OD','TC','PU']})我该如何实现这一点?我基本上需要弄清楚如何将多列的.apply(','.join)放在一起?
发布于 2018-11-09 06:26:45
您可以使用以下任一选项:
result.groupby('serialNo').agg(list) #To get a list of values输出:
Name testName \
serialNo
aaaa [Sayonti, Sayonti, Sayonti] [4402, 4402, 3602]
bbbb [Ruchi] [3747]
ffff [Gowtam] [8754]
moduleName endResult \
serialNo
aaaa [singing, singing, singing] [WARNING, WARNING, WARNING]
bbbb [dance] [FAILED]
ffff [vocals] [FAILED]
Food Work
serialNo
aaaa [Strawberry, Pepsi, Pepsi] [AP, OD, OD]
bbbb [Coke] [TC]
ffff [Nuts] [PU] 或
result.groupby('serialNo').agg(lambda x: ', '.join(x.astype(str))) #to get comma separated strings输出:
Name testName \
serialNo
aaaa Sayonti, Sayonti, Sayonti 4402, 4402, 3602
bbbb Ruchi 3747
ffff Gowtam 8754
moduleName endResult \
serialNo
aaaa singing, singing, singing WARNING, WARNING, WARNING
bbbb dance FAILED
ffff vocals FAILED
Food Work
serialNo
aaaa Strawberry, Pepsi, Pepsi AP, OD, OD
bbbb Coke TC
ffff Nuts PU https://stackoverflow.com/questions/53215650
复制相似问题