我对熊猫很陌生,我正在尝试将以下两个数据合并为1:
nopat
0 2021-12-31 3.580000e+09
1 2020-12-31 6.250000e+08
2 2019-12-31 -1.367000e+09
3 2018-12-31 2.028000e+09 capital_employed
0 2021-12-31 5.924000e+10
1 2020-12-31 6.062400e+10
2 2019-12-31 5.203500e+10
3 2018-12-31 5.441200e+10当我试图将一个函数应用到我的新数据帧时,所有的列都会消失。这是我的代码:
roce_by_year = pd.merge(nopat, capital_employed) \
.rename(columns={"": "date"}) \
.sort_values(by='date') \
.apply(lambda row: compute_roce(row['nopat'], row['capital_employed']), axis=1) \
.reset_index(name='roce')结果如下:
index roce
0 3 3.727119
1 2 -2.627078
2 1 1.030945
3 0 6.043214我希望取得以下结果:
date roce
0 2018 3.727119
1 2019 -2.627078
2 2020 1.030945
3 2021 6.043214你有什么解释吗?
发布于 2022-11-29 09:43:24
如果您想要一个方法链接的解决方案,您可以使用这样的方法:
import pandas as pd
roce_by_year = (
pd.merge(nopat, capital_employed)
.rename(columns={"": "date"})
.assign(
date=lambda xdf: pd.to_datetime(
xdf["date"], errors="coerce"
).dt.year
)
.assign(
roce=lambda xdf: xdf.apply(
lambda row: compute roce(
row["nopat"], row["capital_employed"]
), axis=1
)
)
.sort_values("date", ascending=True)
)[["date", "roce"]]发布于 2022-11-29 09:48:26
df1['date'] = pd.to_datetime(df1['date'])
df1
###
date nopat
0 2021-12-31 3580000000
1 2020-12-31 625000000
2 2019-12-31 -1367000000
3 2018-12-31 2028000000df2['date'] = pd.to_datetime(df2['date'])
df2
###
date capital_employed
0 2021-12-31 59240000000
1 2020-12-31 60624000000
2 2019-12-31 52035000000
3 2018-12-31 54412000000df3 = pd.merge(df1, df2, how='outer', left_on='date', right_on='date')\
.pipe(lambda x: x.assign(roe = x['nopat']/x['capital_employed']))\
.sort_values(by='date', ascending=True)\
.pipe(lambda x: x[['date', 'roe']])\
.pipe(lambda x: x.assign(date = x['date'].dt.strftime('%Y'))).reset_index(drop=True)
df3
###
date roe
0 2018 0.037271
1 2019 -0.026271
2 2020 0.010309
3 2021 0.060432发布于 2022-11-29 09:35:30
Apply只创建新列。您可以尝试在现有的dataframe上创建一个新列,如
nopat.rename(columns={"": "date"}, inplace=True)
nopat.sort_values(by='date', inplace=True)
nopat.set_index('date', inplace=True)
capital_employed.rename(columns={"": "date"}, inplace=True)
capital_employed.set_index('date', inplace=True)
capital_employed.sort_values(by='date', inplace=True)
df = nopat.join(capital_employed, on='date')
df['roce'] = df.apply(lambda row: compute_roce(row['nopat'],
row['capital_employed']), axis=1)https://stackoverflow.com/questions/74611681
复制相似问题