我有这样的数据。
id trade_id TradeDate SettleDate amt
3136 6828 20200616 20200620 15000000.0
3136 6828 20200616 20200620 15000000.0
3136 6828 20200620 20200620 15000000.0
3136 6934 20200623 20200630 15000000.0
3136 7007 20200623 20200630 30000000.0
3136 7050 20200625 20200630 25000000.0 对于每个 id ,需要检查当交易是settled.For (给定的id)时,可能存在多个结算。必须添加一个新列settlement_bal.当交易结算时,我们必须将当天之前的所有金额加在settlement_bal中,如果交易未结算,则settlement_bal为零。
对于id=3136,来说,第一次结算发生在20th June,上,所以结算余额将是zero,直到第19次June.Next结算到30th June.Output时,也会填补掉的日期。
id SettleDate settlement_bal
3136 20200616 0
3136 20200617 0
3136 20200618 0
... ........ 0
3136 20200620 45000000.0
3136 20200621 45000000.0
3136 20200622 45000000.0
.... ........ ..........
3136 20200629 45000000.0
3136 20200630 115000000.0贸易结算可以发生在同一天,更晚,甚至在well.How之前,我能实现这一点吗?
发布于 2021-08-03 07:15:26
#00.module
import pandas as pd
import numpy as np
# 0. data
d=[['3136' ,'6828' ,'20200616', '20200620', 15000000],
['3136' ,'6828' ,'20200616', '20200620', 15000000],
['3136' ,'6828' ,'20200620', '20200620', 15000000],
['3136' ,'6934' ,'20200623', '20200630', 15000000],
['3136' ,'7007' ,'20200623', '20200630', 30000000],
['3136' ,'7050' ,'20200625', '20200630', 25000000]]
data=pd.DataFrame(d,columns='id trade_id TradeDate SettleDate amt'.split())
data['TradeDate']=pd.to_datetime(data['TradeDate'] ,format='%Y%m%d')
data['SettleDate']=pd.to_datetime(data['SettleDate'] ,format='%Y%m%d')
print(data)
# 1.1 make a dataframe t which has each id containing whole Date
min=data[['TradeDate', 'SettleDate']].min().min() #min Date
max=data[['TradeDate', 'SettleDate']].max().max() #max Date
t=pd.DataFrame(pd.date_range(start=min, end=max)).rename(columns={0:'SettleDate'})
t=t.merge(data['id'].drop_duplicates(),how='cross')
# 1.2 merge
result=data.groupby(['id', 'SettleDate']).sum().cumsum().reset_index().merge(t,on=['id', 'SettleDate'], how='outer')
result=result.sort_values(['id','SettleDate'])
print(result) # not result, but intermediete step
# 1.3 fill nan
result['amt']=result[['id','amt']].groupby('id').fillna(method='ffill')
result.loc[result['amt'].isna(),'amt']=0
result=result.rename(columns={'amt':'settlement_bal'})
print(result) #resulthttps://stackoverflow.com/questions/68631008
复制相似问题