寻找一种方法来完成以下操作,但效率更高。
我正在获取股票数据,并查看过去5天的高点和低点,并将其放入新的数据框架中:
Date(index) High Low
datetime obj1 1 1
datetime obj2 2 2
datetime obj3 3 3
datetime obj4 4 4
datetime obj5 5 5
datetime obj6 6 6将会变成
Date(index) High Low
datetime obj1 [] []
datetime obj2 [] []
datetime obj3 [] []
datetime obj4 [] []
datetime obj5 [1,2,3,4,5] [1,2,3,4,5]
datetime obj6 [2,3,4,5,6] [2,3,4,5,6]这是我的代码,它可以工作,但它是一个强制嵌套的for循环。有没有一种方法可以将其矢量化,或者至少更快地提取数据?
df = getdata("SWBI",today,days_back) #just makes the df for the stock data
date_list = df.index.to_list() #makes a list of dates to iterate over
counter= 0
df_predictions= pd.DataFrame({
"date":[],
"hi_his":[],
"lo_his":[]
})
for i in date_list:
dates = date_list[counter-5:counter] #Makes a list of the previous 5 dates
counter += 1
hi = []
lo = []
for date in dates: #makes a list of the values for those 5 days
lo.append(df.loc[date]["Low"])
hi.append(df.loc[date]["High"])
#Make a temporary df to append
df_temp= pd.DataFrame({
"date":i,
"hi_his":[hi],
"lo_his":[lo]
})
df_predictions = df_predictions.append(df_temp) #df ready to de linear regression predictions我知道呀
发布于 2021-04-06 05:31:00
您可以使用构建滑动窗口矩阵
windows = pd.concat([df.shift(n) for n in range(5)], axis=1)
# High Low High Low High Low High Low High Low
# Date
# 2021-01-01 1 1 NaN NaN NaN NaN NaN NaN NaN NaN
# 2021-01-02 2 2 1.0 1.0 NaN NaN NaN NaN NaN NaN
# 2021-01-03 3 3 2.0 2.0 1.0 1.0 NaN NaN NaN NaN
# 2021-01-04 4 4 3.0 3.0 2.0 2.0 1.0 1.0 NaN NaN
# 2021-01-05 5 5 4.0 4.0 3.0 3.0 2.0 2.0 1.0 1.0
# 2021-01-06 6 6 5.0 5.0 4.0 4.0 3.0 3.0 2.0 2.0然后将High和Low列折叠到各自的列表中:
df.High = pd.Series(windows.filter(like='High').values.tolist(), index=df.index)
df.Low = pd.Series(windows.filter(like='Low').values.tolist(), index=df.index)
# High Low
# Date
# 2021-01-01 [1.0, nan, nan, nan, nan] [1.0, nan, nan, nan, nan]
# 2021-01-02 [2.0, 1.0, nan, nan, nan] [2.0, 1.0, nan, nan, nan]
# 2021-01-03 [3.0, 2.0, 1.0, nan, nan] [3.0, 2.0, 1.0, nan, nan]
# 2021-01-04 [4.0, 3.0, 2.0, 1.0, nan] [4.0, 3.0, 2.0, 1.0, nan]
# 2021-01-05 [5.0, 4.0, 3.0, 2.0, 1.0] [5.0, 4.0, 3.0, 2.0, 1.0]
# 2021-01-06 [6.0, 5.0, 4.0, 3.0, 2.0] [6.0, 5.0, 4.0, 3.0, 2.0]如果您想清空nan列表,请使用检查
df = df.applymap(lambda x: [] if np.isnan(x).any() else x)
# High Low
# Date
# 2021-01-01 [] []
# 2021-01-02 [] []
# 2021-01-03 [] []
# 2021-01-04 [] []
# 2021-01-05 [5.0, 4.0, 3.0, 2.0, 1.0] [5.0, 4.0, 3.0, 2.0, 1.0]
# 2021-01-06 [6.0, 5.0, 4.0, 3.0, 2.0] [6.0, 5.0, 4.0, 3.0, 2.0]发布于 2021-04-06 04:36:54
您可以使用DataFrame.tail,在本例中为df.tail(5),从最后5行中提取数据!
https://stackoverflow.com/questions/66959405
复制相似问题