文章/答案/技术大牛

发布

社区首页 >问答首页 >使用datetimeindex timeseries数据源创建股票分析数据

问使用datetimeindex timeseries数据源创建股票分析数据
EN

Stack Overflow用户

提问于 2020-06-11 01:15:54

回答 1查看 71关注 0票数 0

我有一个数据源，它为我提供了以下数据，pricehistory

+---------------------+------------+------------+------------+------------+----------+------+
|         time        |   close    |    high    |    low     |    open    |  volume  | red  |
+---------------------+------------+------------+------------+------------+----------+------+
|                     |            |            |            |            |          |      |
| 2020-01-02 10:14:00 | 321.336177 | 321.505186 | 321.286468 | 321.505186 | 311601.0 | True |
| 2020-01-02 11:16:00 | 321.430623 | 321.465419 | 321.395827 | 321.465419 | 42678.0  | True |
| 2020-01-02 11:17:00 | 321.425652 | 321.445536 | 321.375944 | 321.440565 | 39827.0  | True |
| 2020-01-02 11:33:00 | 321.137343 | 321.261614 | 321.137343 | 321.261614 | 102805.0 | True |
| 2020-01-02 12:11:00 | 321.256643 | 321.266585 | 321.241731 | 321.266585 | 25629.0  | True |
| 2020-01-02 12:12:00 | 321.246701 | 321.266585 | 321.231789 | 321.266585 | 40869.0  | True |
| 2020-01-02 13:26:00 | 321.226818 | 321.266585 | 321.226818 | 321.261614 | 44011.0  | True |
| 2020-01-03 10:18:00 | 320.839091 | 320.958392 | 320.828155 | 320.958392 | 103351.0 | True |
| 2020-01-03 10:49:00 | 320.988217 | 321.077692 | 320.988217 | 321.057809 | 84492.0  | True |
| etc...              | etc...     | etc...     | etc...     | etc...     | etc...   | etc. |
+---------------------+------------+------------+------------+------------+----------+------+

pricehistory.dtypes输出

close     float64
high      float64
low       float64
open      float64
volume    float64
red          bool
dtype: object

pricehistory.index.dtype：dtype('<M8[ns]')的输出

注意:这个数据很大，每一行都是1分钟的数据，跨度长达几个月，所以有很多时间框架需要迭代。

问题：

我有一些我想要使用的特定标准，它们将成为新的dataframe中的列：

高价格和时间(分钟)为整个数据中心
第一次出现4分钟的下行趋势，在一天中与各自的时间

到目前为止，我还不确定如何从pricehistory中提取时间(datetimeindex值)和高价格。

对于上面的(1)，我使用的是pd.DataFrame(pricehistory.high.groupby(pd.Grouper(freq='D')).max())，它给了我：

+------------+------------+
|    time    |    high    |
+------------+------------+
|            |            |
| 2020-01-02 | 322.956677 |
| 2020-01-03 | 321.753729 |
| 2020-01-04 | NaN        |
| 2020-01-05 | NaN        |
| 2020-01-06 | 321.843204 |
| etc...     | etc...     |
+------------+------------+

但是这不起作用，因为它只给了我一天而不是一分钟，而使用min作为Grouper freq不起作用，因为它只是每分钟的最大值，也就是high。

预期结果(注:包括会议记录)：

+---------------------+------------+
|    time             |    high    |
+---------------------+------------+
|                     |            |
| 2020-01-02 9:31:00  | 322.956677 |
| 2020-01-03 10:13:11 | 321.753729 |
| 2020-01-04 15:33:12 | 320.991231 |
| 2020-01-06 12:01:23 | 321.843204 |
| etc...              | etc...     |
+---------------------+------------+

对于上面的(2)，我使用以下方法：

pricehistory['red'] = pricehistory['close'].lt(pricehistory['open'])

在pricehistory中创建一个新列，它将显示一列中是否有4分钟的红色。

然后，使用new_pricehistory = pricehistory.loc[pricehistory[::-1].rolling(4)['red'].sum().eq(4)]，这提供了一个新的数据，它只提供了一行出现4分钟红色的数据，最好是第一次出现，而不是全部出现。

当前产出：

+---------------------+------------+------------+------------+------------+--------+------+
|        time         |   close    |    high    |    low     |    open    | volume | red  |
+---------------------+------------+------------+------------+------------+--------+------+
|                     |            |            |            |            |        |      |
| 2020-01-02 10:14:00 | 321.336177 | 321.505186 | 321.286468 | 321.505186 | 311601 | TRUE |
| 2020-01-03 10:18:00 | 320.839091 | 320.958392 | 320.828155 | 320.958392 | 103351 | TRUE |
| 2020-01-06 10:49:00 | 320.520956 | 320.570665 | 320.501073 | 320.550781 |  71901 | TRUE |
+---------------------+------------+------------+------------+------------+--------+------+

stock

python

pandas

finance

回答 1

Stack Overflow用户

发布于 2020-06-11 01:47:55

既然你没有提供数据，我就会创建一个虚拟的。根据这样的政策，每个问题你应该提出不同的问题。现在我回答第一个问题。

生成数据

import pandas as pd
import numpy as np

times = pd.date_range(start="2020-06-01", end="2020-06-10", freq="1T")
df = pd.DataFrame({"time":times,
                  "high":np.random.randn(len(times))})

问题1

在这里，我只是寻找指数，其中最大的一天，并相应地过滤df

idx = df.groupby(df["time"].dt.date)["high"].idxmax().values

df[df.index.isin(idx)]

更新：，如果您在df中有时间作为索引，解决方案将是

df = df.set_index("time")

idx = df.groupby(pd.Grouper(freq='D'))["high"].idxmax().values
df[df.index.isin(idx)]

问题2

import pandas as pd
import numpy as np

# generate data
times = pd.date_range(start="2020-06-01", end="2020-06-10", freq="1T")
df = pd.DataFrame({"time":times,
                   "open":np.random.randn(len(times))})

df["open"] = np.where(df["open"]<0, -1 * df["open"], df["open"])
df["close"] = df["open"] + 0.01 *np.random.randn(len(times))
df = df.set_index("time")
df["red"] = df['close'].lt(df['open'])

# this function return the first time 
# when there are 4 consecutive red

def get_first(ts):
    idx = ts.loc[ts[::-1].rolling(4)['red'].sum().ge(4)].index
    if idx.empty:
        return pd.NaT
    else:
        return idx[0]

# get first time within group and drop nan
grp = df.groupby(pd.Grouper(freq='D'))\
        .apply(get_first).dropna()



df[df.index.isin(grp.values)]

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/62315562

复制

相似问题

问使用datetimeindex timeseries数据源创建股票分析数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用datetimeindex timeseries数据源创建股票分析数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用datetimeindex timeseries数据源创建股票分析数据
EN