我有一个稍微独特的问题要解决使用Pandas Dataframe。我有以下两个数据:
df1
time, Date, Stock, StartTime, EndTime
2016-10-11 12:00:00 2016-10-11 ABC 12:00:00.243 13:06:34.232
2016-10-11 12:01:00 2016-10-11 ABC 12:02:00.243 13:04:34.232
2016-10-11 12:03:00 2016-10-11 XYZ 08:02:00.243 11:24:23.533
df2
time, Date, Stock, Price, Volume
2016-10-11 12:00:00 2016-10-11 ABC 10.0 100
2016-10-11 12:01:00 2016-10-11 ABC 10.1 300
...
2016-10-11 16:01:00 2016-10-11 ABC 10.4 600
2016-10-11 12:01:00 2016-10-11 XYZ 5.1 1500
...
2016-10-11 17:01:00 2016-10-11 XYZ 10.1 200
...现在,对于df1中的每一行,我想将其加入到df2 on Date和Stock列中,这样在df2中,我就能够计算StartTime和df1中EndTime中所有行的加权价格。
发布于 2016-10-12 23:44:20
一种合并、分组和应用加权平均函数。
将您的数据迁移到代码中,这样便于人们加载。
df1 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11'}, 'Stock': {0: 'ABC', 1: 'ABC', 2: 'XYZ'}, 'EndTime': {0: '13:06:34.232', 1: '13:04:34.232', 2: '11:24:23.533'}, 'StartTime': {0: '12:00:00.243', 1: '12:02:00.243', 2: '08:02:00.243'}, 'time': {0: '12:00:00', 1: '12:01:00', 2: '12:03:00'}})
df2 = pd.DataFrame({'Date': {0: '2016-10-11', 1: '2016-10-11', 2: '2016-10-11', 3: '2016-10-11', 4: '2016-10-11'}, 'Volume': {0: 100, 1: 300, 2: 600, 3: 1500, 4: 200}, 'Price': {0: 10.0, 1: 10.1, 2: 10.4, 3: 5.0999999999999996, 4: 10.1}, 'Stock': {0: 'ABC', 1: 'ABC', 2: 'ABC', 3: 'XYZ', 4: 'XYZ'}, 'time': {0: '12:00:00', 1: '12:01:00', 2: '16:01:00', 3: '12:01:00', 4: '17:01:00'}})
print df1
print df2我假设您的数据如下,问题有点不清楚,让我知道,我们可以修改这个例子,以便答案符合问题的要求,多余的日期在时间范围内,我已经省略了:
Date EndTime StartTime Stock time
0 2016-10-11 13:06:34.232 12:00:00.243 ABC 12:00:00
1 2016-10-11 13:04:34.232 12:02:00.243 ABC 12:01:00
2 2016-10-11 11:24:23.533 08:02:00.243 XYZ 12:03:00
Date Price Stock Volume time
0 2016-10-11 10.0 ABC 100 12:00:00
1 2016-10-11 10.1 ABC 300 12:01:00
2 2016-10-11 10.4 ABC 600 16:01:00
3 2016-10-11 5.1 XYZ 1500 12:01:00
4 2016-10-11 10.1 XYZ 200 17:01:00
df_merged= df1.merge(df2, on=['Date','Stock']) # Merge
df_merged = df_merged[['StartTime','EndTime','Price','Volume','Stock']] #Filter Columns
Without Stock Partition:
print df_merged.groupby(['StartTime','EndTime']).apply(lambda x: np.average(x['Price'],weights=x['Volume']))
With Stock Partition:
print df_merged.groupby(['StartTime','EndTime','Stock']).apply(lambda x: np.average(x['Price'],weights=x['Volume']))给予:
StartTime EndTime
08:02:00.243 11:24:23.533 5.688235
12:00:00.243 13:06:34.232 10.270000
12:02:00.243 13:04:34.232 10.270000
dtype: float64
StartTime EndTime Stock
08:02:00.243 11:24:23.533 XYZ 5.688235
12:00:00.243 13:06:34.232 ABC 10.270000
12:02:00.243 13:04:34.232 ABC 10.270000https://stackoverflow.com/questions/40009591
复制相似问题