我有下面的DF。
Date Time Open High Low Close
0 2010-01-03 17:00:00 1.4301 1.4304 1.4301 1.4304
1 2010-01-03 17:01:00 1.4303 1.4303 1.4303 1.4303我需要正常化价格在每一天内,所以它的必要性除以每一天的价格除以它的第一个价值的一天,所以每一天将从1.0开始。我已经编写了以下代码,但是它的工作速度非常慢,我如何改进它呢?我觉得它太复杂了,有没有一种优雅的方法?
for year in range(2010, 2021):
for month in range(1, 13):
for day in range(1, 31):
mutdf = dfc.loc[(dfc['Date'].dt.year == year) & (dfc['Date'].dt.month == month) & (dfc['Date'].dt.day == day),
['Open', 'High', 'Low', 'Close']]
if mutdf.empty:
continue
mutdf['Open'] = mutdf['Open'].divide(mutdf.iloc[0, 0])
mutdf['High'] = mutdf['High'].divide(mutdf.iloc[0, 1])
mutdf['Low'] = mutdf['Low'].divide(mutdf.iloc[0, 2])
mutdf['Close'] = mutdf['Close'].divide(mutdf.iloc[0, 3])
dfc.loc[(dfc['Date'].dt.year == year) & (dfc['Date'].dt.month == month) & (dfc['Date'].dt.day == day),
['Open', 'High', 'Low', 'Close']] = mutdf期望产出:
Date Time Open High Low Close
0 2010-01-03 17:00:00 1.00000 1.00000 1.00000 1.000000
1 2010-01-03 17:01:00 1.00014 0.99993 1.00014 0.999930
2 2010-01-03 17:02:00 1.00007 0.99993 1.00000 0.999930
3 2010-01-03 17:03:00 1.00007 0.99986 1.00007 0.999860
4 2010-01-03 17:04:00 1.00000 0.99986 0.99979 0.999720
5 2010-01-03 17:06:00 1.00000 0.99979 0.99993 0.999790
6 2010-01-03 17:08:00 0.99993 0.99986 0.99993 0.999790
7 2010-01-03 17:09:00 0.99993 0.99979 0.99979 0.999581
8 2010-01-03 17:10:00 0.99986 0.99979 0.99986 0.999790
9 2010-01-03 17:12:00 1.00007 0.99993 1.00007 0.999930发布于 2020-08-02 05:49:09
groupby on Date并除以第一个值:
df["Open"] = df.groupby("Date")["Open"].transform(lambda d: d/d.iat[0])
print (df)
Date Time Open High Low Close
0 2010-01-03 17:00:00 1.00000 1.4304 1.4301 1.4304
1 2010-01-03 17:01:00 1.00014 1.4303 1.4303 1.4303一次完成所有列的工作:
col = ['Open', 'High', 'Low', 'Close']
print (df.set_index(["Date","Time"])
.groupby("Date").apply(lambda d: d[col]/df[col].iloc[0])
.reset_index())
Date Time Open High Low Close
0 2010-01-03 17:00:00 1.00000 1.00000 1.00000 1.00000
1 2010-01-03 17:01:00 1.00014 0.99993 1.00014 0.99993发布于 2020-08-02 07:09:10
datetime列,而不是date和time列。datetime。如果是这样的话,不要拆分该列。df.info()
datetime列是datetime dtype如果数据确实带有单独的列,最好将它们连接到一个dtype.
pandas.DataFrame.iat与Henry Yik中的solution相似,除了Datetime列使d40计算更直接。H 241F 242import pandas as pd
data = {'Date': ['2010-01-03', '2010-01-03'], 'Time': ['17:00:00', '17:01:00'], 'Open': [1.4301, 1.4303], 'High': [1.4304, 1.4303], 'Low': [1.4301, 1.4303], 'Close': [1.4304, 1.4303]}
df = pd.DataFrame(data)
# convert Date to a datetime
df.Date = pd.to_datetime(df.Date)
# convert Time to a timedelta
df.Time = pd.to_timedelta(df.Time)
# create a single Datetime column
df['Datetime'] = df.Date + df.Time
# drop Date and Time
df = df.drop(columns=['Date', 'Time'])
# set Datetime as the index
df = df.set_index('Datetime')
# display(df)
Open High Low Close
Datetime
2010-01-03 17:00:00 1.4301 1.4304 1.4301 1.4304
2010-01-03 17:01:00 1.4303 1.4303 1.4303 1.4303
# groupby the date and normalize all rows
dfg = df.groupby(df.index.date).transform(lambda row: row/row.iat[0])
# display(dfg)
Open High Low Close
Datetime
2010-01-03 17:00:00 1.00000 1.00000 1.00000 1.00000
2010-01-03 17:01:00 1.00014 0.99993 1.00014 0.99993https://stackoverflow.com/questions/63212594
复制相似问题