文章/答案/技术大牛

发布

问下载日内历史股票数据
EN

Stack Overflow用户

提问于 2020-11-27 16:45:57

回答 1查看 731关注 0票数 0

我想下载历史的盘中股票数据。我发现AlphaVantage提供了两年的数据。这是我免费找到的最长的数据历史。

我正在制作一个脚本来下载他们提供的所有滴答符号和所有时间框架的整整两年的数据。他们提供的数据除以30天间隔从今天(或最后一个交易日，我不确定)。这些行从最新的日期到最古老的日期。我希望反转数据出现的顺序，并将所有月份连接起来，列标题只出现一次。因此，我将有一个csv文件，其中包含每个股票和时间框架的两年数据。数据行将从最老的时间日期到最新的时间日期。

我遇到的问题是，我还想使用脚本来更新数据，而且我不知道如何只添加没有出现在我的文件中的数据。我下载的数据每隔15分钟从2020-09-28 07:15:00到2020-10-26 20:00:00 (当它们存在时，会丢失一些数据)。当我再次使用脚本时，我想更新数据。我想以某种方式删除已经出现的行，并只追加其余的行。因此，如果最后出现的日期时间是2020-10-26 20:00:00，那么如果它存在，它将继续从2020-10-26 2020-10-26 20:15:00追加。如何正确更新数据？

另外，在更新时，如果文件已经存在，它会复制列标题，这是我不想做的事情。编辑:我已经用header=(not os.path.exists(file))解决了这个问题，但是检查文件是否存在于每个迭代中似乎效率很低。

我还必须使脚本符合API的规则，即每分钟5个调用，每天500个调用。是否有办法使脚本在达到每日限制时停止，并在下次运行时继续运行？还是应该在API调用之间添加173秒的睡眠？

import os
import glob
import pandas as pd

from typing import List
from requests import get
from pathlib import Path
import os.path
import sys

BASE_URL= 'https://www.alphavantage.co/'


def download_previous_data(
    file: str,
    ticker: str,
    timeframe: str,
    slices: List,
):
    for _slice in slices:
        url = f'{BASE_URL}query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol={ticker}&interval={timeframe}&slice={_slice}&apikey=demo&datatype=csv'
        pd.read_csv(url).iloc[::-1].to_csv(file, mode='a', index=False, encoding='utf-8-sig')


def main():

    # Get a list of all ticker symbols
    print('Downloading ticker symbols:')
    #df = pd.read_csv('https://www.alphavantage.co/query?function=LISTING_STATUS&apikey=demo')
    #tickers = df['symbol'].tolist()
    tickers = ['IBM']

    timeframes = ['1min', '5min', '15min', '30min', '60min']

    # To download the data in a subdirectory where the script is located
    modpath = os.path.dirname(os.path.abspath(sys.argv[0]))

    # Make sure the download folders exists
    for timeframe in timeframes:
        download_path = f'{modpath}/{timeframe}'
        #download_path = f'/media/user/Portable Drive/Trading/data/{timeframe}'
        Path(download_path).mkdir(parents=True, exist_ok=True)

    # For each ticker symbol download all data available for each timeframe
    # except for the last month which would be incomplete.
    # Each download iteration has to be in a 'try except' in case the ticker symbol isn't available on alphavantage
    for ticker in tickers:
        print(f'Downloading data for {ticker}...')
        for timeframe in timeframes:
            download_path = f'{modpath}/{timeframe}'
            filepath = f'{download_path}/{ticker}.csv'

            # NOTE:
            # To ensure optimal API response speed, the trailing 2 years of intraday data is evenly divided into 24 "slices" - year1month1, year1month2,
            # year1month3, ..., year1month11, year1month12, year2month1, year2month2, year2month3, ..., year2month11, year2month12.
            # Each slice is a 30-day window, with year1month1 being the most recent and year2month12 being the farthest from today.
            # By default, slice=year1month1

            if Path(filepath).is_file():  # if the file already exists
                # download the previous to last month
                slices = ['year1month2']
                download_previous_data(filepath, ticker, timeframe, slices)
            else:  # if the file doesn't exist
                # download the two previous years
                #slices = ['year2month12', 'year2month11', 'year2month10', 'year2month9', 'year2month8', 'year2month7', 'year2month6', 'year2month5', 'year2month4', 'year2month3', 'year2month2', 'year2month1', 'year1month12', 'year1month11', 'year1month10', 'year1month9', 'year1month8', 'year1month7', 'year1month6', 'year1month5', 'year1month4', 'year1month3', 'year1month2']
                slices = ['year1month2']
                download_previous_data(filepath, ticker, timeframe, slices)


if __name__ == '__main__':
    main()

python

pandas

download

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-11-27 17:13:37

你的问题太多了！这些都是你可以尝试的建议，但我无法检验它们的有效性：

将所有文件名读取到列表中，检查列表中是否存在文件名，而不是每次
读取现有文件中的数据并追加大熊猫中的所有文件并写入新文件时，检查文件名是否存在。不知道你是否在附加csv文件，但如果你在那里有困难，只需读取数据和附加新数据-直到你知道如何正确地附加一个excel。或者将新的迭代保存到自己的文件中，并在以后合并文件。如果您关注在for循环中让duplicates
Look进入time.sleep()时间模块，以减少调用
，那么如果您有1分钟的数据，可以查看重采样()到5分钟，15分钟，而不是在所有这些时间帧中导入

。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65040925

复制

相似问题

问下载日内历史股票数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问下载日内历史股票数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问下载日内历史股票数据
EN