我从我的银行下载了一笔交易,例如
Date, Amount不幸的是,CSV下载不包含起始余额,所以我在DataFrame的顶部添加了初始值。因此,现在的数据如下所示:
Date, Amount, Balance
2018-01-01, 0, 10
2018-01-01, 10, 20
2018-01-02, 20, 40
2018-01-02, -10, 30
2018-01-03, 20, 50
2018-01-31, 0, 50余额是通过将以前的余额与当前金额相加来计算的。
这是我能收集到的,闻起来很难闻:
df = pd.read_csv("~/Downloads/Chequing.CSV", parse_dates=[0], na_values="n/a")
df['Date'] = pd.to_datetime(df['Date'])
df['Balance'] = 0
df1 = pd.DataFrame(data={'Date': ['2018-01-01'], 'Transaction':
['CREDIT'], 'Name': ['Open'], 'Memo': ['Open'], 'Amount': [0], "Balance": [10.00]})
df1['Date'] = pd.to_datetime(df1['Date'])
df2 = pd.concat([df1, df], sort=False, ignore_index=True)
for i in range(1, len(df2)):
prev_balance = df2['Balance'].iloc[i-1]
amount = df2['Amount'].iloc[i]
new_balance = round(amount + prev_balance, 2)
df2['Balance'].iloc[i] = new_balance
# Above generates a warning:
# SettingWithCopyWarning:
# A value is trying to be set on a copy of a slice from a DataFrame
# While writing this, I was able to get it working by replacing the for loop above with:
df2['Balance'] = round((df2["Amount"] + df2["Balance"]).cumsum(), 2)
pd.set_option('display.max_columns', None)
print(df2.groupby(df['Date'].dt.strftime('%m %B'))['Date', 'Amount', 'Transaction', 'Name', 'Balance'].max())我的问题现在变成了,舍入是必要的吗?这是否可以优化或以更好的方式编写?
谢谢!
发布于 2019-01-01 22:20:28
这是我可以做的
%%time
df.Balance = np.concatenate((df.Balance[:1], (df.Balance.shift().fillna(0)+df.Amount).cumsum()[1:]))
#Wall time: 2 ms比较for循环方法
%%time
for i in range(1,len(df.Balance)):
df.Balance[i] = df.Balance[i-1]+df.Amount[i]
# Wall time: 173 ms按月最大余额
df
Date Amount Balance
0 2018-01-01 0 10
1 2018-01-01 10 20
2 2018-01-02 20 40
3 2018-02-02 -10 30
4 2018-03-03 20 50
5 2018-03-31 10 60
df.groupby(df.Date.dt.month).apply(lambda x: x[x.Balance == x.Balance.max()]).reset_index(drop=True)
Date Amount Balance
0 2018-01-02 20 40
1 2018-02-02 -10 30
2 2018-03-31 10 60我希望这对你有所帮助。欢迎提出意见;)
https://stackoverflow.com/questions/53994331
复制相似问题