首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从Pandas的前5行中提取数据的最佳方法

从Pandas的前5行中提取数据的最佳方法
EN

Stack Overflow用户
提问于 2021-04-06 04:30:50
回答 2查看 76关注 0票数 2

寻找一种方法来完成以下操作,但效率更高。

我正在获取股票数据,并查看过去5天的高点和低点,并将其放入新的数据框架中:

代码语言:javascript
复制
Date(index)     High    Low
datetime obj1   1        1 
datetime obj2   2        2 
datetime obj3   3        3
datetime obj4   4        4 
datetime obj5   5        5 
datetime obj6   6        6

将会变成

代码语言:javascript
复制
Date(index)     High              Low
datetime obj1   []                [] 
datetime obj2   []                []
datetime obj3   []                []
datetime obj4   []                []
datetime obj5   [1,2,3,4,5]       [1,2,3,4,5]
datetime obj6   [2,3,4,5,6]       [2,3,4,5,6]

这是我的代码,它可以工作,但它是一个强制嵌套的for循环。有没有一种方法可以将其矢量化,或者至少更快地提取数据?

代码语言:javascript
复制
  df = getdata("SWBI",today,days_back) #just makes the df for the stock data
  date_list = df.index.to_list() #makes a list of dates to iterate over
  counter= 0
  df_predictions= pd.DataFrame({
      "date":[],
      "hi_his":[],
      "lo_his":[]
  })

  for i in date_list:
    dates = date_list[counter-5:counter] #Makes a list of the previous 5 dates
    counter += 1 
    hi = [] 
    lo = []
    for date in dates: #makes a list of the values for those 5 days 
      lo.append(df.loc[date]["Low"])
      hi.append(df.loc[date]["High"])
    #Make a temporary df to append
    df_temp= pd.DataFrame({
      "date":i,
      "hi_his":[hi],
      "lo_his":[lo]
      })
    df_predictions = df_predictions.append(df_temp) #df ready to de linear regression predictions

我知道呀

EN

回答 2

Stack Overflow用户

发布于 2021-04-06 05:31:00

您可以使用构建滑动窗口矩阵

代码语言:javascript
复制
windows = pd.concat([df.shift(n) for n in range(5)], axis=1)

#             High  Low  High  Low  High  Low  High  Low  High  Low
# Date                                                             
# 2021-01-01     1    1   NaN  NaN   NaN  NaN   NaN  NaN   NaN  NaN
# 2021-01-02     2    2   1.0  1.0   NaN  NaN   NaN  NaN   NaN  NaN
# 2021-01-03     3    3   2.0  2.0   1.0  1.0   NaN  NaN   NaN  NaN
# 2021-01-04     4    4   3.0  3.0   2.0  2.0   1.0  1.0   NaN  NaN
# 2021-01-05     5    5   4.0  4.0   3.0  3.0   2.0  2.0   1.0  1.0
# 2021-01-06     6    6   5.0  5.0   4.0  4.0   3.0  3.0   2.0  2.0

然后将HighLow列折叠到各自的列表中:

代码语言:javascript
复制
df.High = pd.Series(windows.filter(like='High').values.tolist(), index=df.index)
df.Low = pd.Series(windows.filter(like='Low').values.tolist(), index=df.index)

#                                  High                        Low
# Date                                                            
# 2021-01-01  [1.0, nan, nan, nan, nan]  [1.0, nan, nan, nan, nan]
# 2021-01-02  [2.0, 1.0, nan, nan, nan]  [2.0, 1.0, nan, nan, nan]
# 2021-01-03  [3.0, 2.0, 1.0, nan, nan]  [3.0, 2.0, 1.0, nan, nan]
# 2021-01-04  [4.0, 3.0, 2.0, 1.0, nan]  [4.0, 3.0, 2.0, 1.0, nan]
# 2021-01-05  [5.0, 4.0, 3.0, 2.0, 1.0]  [5.0, 4.0, 3.0, 2.0, 1.0]
# 2021-01-06  [6.0, 5.0, 4.0, 3.0, 2.0]  [6.0, 5.0, 4.0, 3.0, 2.0]

如果您想清空nan列表,请使用检查

代码语言:javascript
复制
df = df.applymap(lambda x: [] if np.isnan(x).any() else x)

#                                  High                        Low
# Date                                                            
# 2021-01-01                         []                         []
# 2021-01-02                         []                         []
# 2021-01-03                         []                         []
# 2021-01-04                         []                         []
# 2021-01-05  [5.0, 4.0, 3.0, 2.0, 1.0]  [5.0, 4.0, 3.0, 2.0, 1.0]
# 2021-01-06  [6.0, 5.0, 4.0, 3.0, 2.0]  [6.0, 5.0, 4.0, 3.0, 2.0]
票数 2
EN

Stack Overflow用户

发布于 2021-04-06 04:36:54

您可以使用DataFrame.tail,在本例中为df.tail(5),从最后5行中提取数据!

票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/66959405

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档