因此,我有两个数据帧,如下所示
data = {'BugCatcher': ['Fred', 'Fred', 'George', 'George'],
'Date': ['1/13/2020', '1/14/2020', '1/20/2020', '1/26/2020'],
'BugsCaught': ['Spider','Spider', 'Butterfly', 'Butterfly'],
'BugsFound': [1, 4, 5, 8],
'BugsFoundRunningTotal': [1, 5, 5, 13]
}
bug_df = pd.DataFrame(data, columns = ['BugCatcher', 'Date', 'BugsCaught', 'BugsFound', 'BugsFoundRunningTotal'])
bug_df
BugCatcher Date BugsCaught BugsFound BugsFoundRunningTotal
0 Fred 1/13/2020 Spider 1 1
1 Fred 1/14/2020 Spider 4 5
2 George 1/20/2020 Butterfly 5 5
3 George 1/26/2020 Butterfly 8 13
data2 = {'Name': ['Fred', 'Fred', 'George', 'George'],
'Date': ['1/5/2020', '1/6/2020', '1/17/2020', '1/30/2020'],
'NumberOfBooksReadOnCatchingBugs': [2, 3, 1, 3],
}
book_df = pd.DataFrame(data2, columns = ['Name', 'Date', 'NumberOfBooksReadOnCatchingBugs'])
book_df
Name Date NumberOfBooksReadOnCatchingBugs
0 Fred 1/5/2020 2
1 Fred 1/6/2020 3
2 George 1/17/2020 1
3 George 1/30/2020 3我正在寻找一种方法来有条件地填充或加入某些部分,以便我们得到这样的输出。我们的想法是,我们将这两个部分连接起来,但只在某个特定日期之前的部分。我们的想法是最终绘制一个线条和柱状图,它将运行的bug总数显示为一条线,然后书籍以列的形式阅读。我试过加入它,但没有给出正确的结果。
desired = {'BugCatcher': ['Fred', 'Fred', 'Fred', 'Fred', 'George', 'George', 'George', 'George'],
'Date': ['1/5/2020', '1/6/2020','1/13/2020', '1/14/2020', '1/17/2020', '1/20/2020', '1/26/2020', '1/30/2020'],
'NumberOfBooksReadOnCatchingBugs': [2, 3, 3, 3, 1, 1, 1, 3],
'BugsType': ['Spider','Spider', 'Spider', 'Spider', 'Butterfly', 'Butterfly', 'Butterfly', 'Butterfly'],
'QuantityFound': [0, 0, 1, 4, 0, 5, 8, 0],
'BugsFoundRunningTotal': [0, 0, 1, 5, 0, 5, 13, 13]
}
output = pd.DataFrame(desired, columns = ['BugCatcher', 'Date', 'NumberOfBooksReadOnCatchingBugs', 'BugsType', 'QuantityFound', 'BugsFoundRunningTotal'])
output
BugCatcher Date NumberOfBooksReadOnCatchingBugs BugsType QuantityFound BugsFoundRunningTotal
0 Fred 1/5/2020 2 Spider 0 0
1 Fred 1/6/2020 3 Spider 0 0
2 Fred 1/13/2020 3 Spider 1 1
3 Fred 1/14/2020 3 Spider 4 5
4 George 1/17/2020 1 Butterfly 0 0
5 George 1/20/2020 1 Butterfly 5 5
6 George 1/26/2020 1 Butterfly 8 13
7 George 1/30/2020 3 Butterfly 0 13如有任何帮助,我们不胜感激!
谢谢
发布于 2020-12-04 14:46:35
试试这个:
bug_df['Date'] = pd.to_datetime(bug_df['Date'])
book_df['Date'] = pd.to_datetime(book_df['Date'])
final_df = bug_df.set_index(['BugCatcher', 'Date']).append(book_df.set_index(['Name', 'Date'])).sort_index()
final_df['BugsCaught'] = final_df.groupby(level=0)['BugsCaught'].fillna(method='bfill').fillna(method='ffill')
final_df['BugsFound'] = final_df['BugsFound'].fillna(0).astype(int)
final_df['BugsFoundRunningTotal'] = final_df.groupby(level=0)['BugsFoundRunningTotal'].fillna(method='ffill').fillna(0).astype(int)
final_df['NumberOfBooksReadOnCatchingBugs'] = final_df.groupby(level=0)['NumberOfBooksReadOnCatchingBugs'].fillna(method='ffill').fillna(method='bfill').astype(int)
final_df = final_df.reset_index()输出:
BugCatcher Date BugsCaught BugsFound BugsFoundRunningTotal NumberOfBooksReadOnCatchingBugs
0 Fred 2020-01-05 Spider 0 0 2
1 Fred 2020-01-06 Spider 0 0 3
2 Fred 2020-01-13 Spider 1 1 3
3 Fred 2020-01-14 Spider 4 5 3
4 George 2020-01-17 Butterfly 0 0 1
5 George 2020-01-20 Butterfly 5 5 1
6 George 2020-01-26 Butterfly 8 13 1
7 George 2020-01-30 Butterfly 0 13 3https://stackoverflow.com/questions/65137800
复制相似问题