我想下载历史的盘中股票数据。我发现AlphaVantage提供了两年的数据。这是我免费找到的最长的数据历史。
我正在制作一个脚本来下载他们提供的所有滴答符号和所有时间框架的整整两年的数据。他们提供的数据除以30天间隔从今天(或最后一个交易日,我不确定)。这些行从最新的日期到最古老的日期。我希望反转数据出现的顺序,并将所有月份连接起来,列标题只出现一次。因此,我将有一个csv文件,其中包含每个股票和时间框架的两年数据。数据行将从最老的时间日期到最新的时间日期。
我遇到的问题是,我还想使用脚本来更新数据,而且我不知道如何只添加没有出现在我的文件中的数据。我下载的数据每隔15分钟从2020-09-28 07:15:00到2020-10-26 20:00:00 (当它们存在时,会丢失一些数据)。当我再次使用脚本时,我想更新数据。我想以某种方式删除已经出现的行,并只追加其余的行。因此,如果最后出现的日期时间是2020-10-26 20:00:00,那么如果它存在,它将继续从2020-10-26 2020-10-26 20:15:00追加。如何正确更新数据?
另外,在更新时,如果文件已经存在,它会复制列标题,这是我不想做的事情。编辑:我已经用header=(not os.path.exists(file))解决了这个问题,但是检查文件是否存在于每个迭代中似乎效率很低。
我还必须使脚本符合API的规则,即每分钟5个调用,每天500个调用。是否有办法使脚本在达到每日限制时停止,并在下次运行时继续运行?还是应该在API调用之间添加173秒的睡眠?
import os
import glob
import pandas as pd
from typing import List
from requests import get
from pathlib import Path
import os.path
import sys
BASE_URL= 'https://www.alphavantage.co/'
def download_previous_data(
file: str,
ticker: str,
timeframe: str,
slices: List,
):
for _slice in slices:
url = f'{BASE_URL}query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol={ticker}&interval={timeframe}&slice={_slice}&apikey=demo&datatype=csv'
pd.read_csv(url).iloc[::-1].to_csv(file, mode='a', index=False, encoding='utf-8-sig')
def main():
# Get a list of all ticker symbols
print('Downloading ticker symbols:')
#df = pd.read_csv('https://www.alphavantage.co/query?function=LISTING_STATUS&apikey=demo')
#tickers = df['symbol'].tolist()
tickers = ['IBM']
timeframes = ['1min', '5min', '15min', '30min', '60min']
# To download the data in a subdirectory where the script is located
modpath = os.path.dirname(os.path.abspath(sys.argv[0]))
# Make sure the download folders exists
for timeframe in timeframes:
download_path = f'{modpath}/{timeframe}'
#download_path = f'/media/user/Portable Drive/Trading/data/{timeframe}'
Path(download_path).mkdir(parents=True, exist_ok=True)
# For each ticker symbol download all data available for each timeframe
# except for the last month which would be incomplete.
# Each download iteration has to be in a 'try except' in case the ticker symbol isn't available on alphavantage
for ticker in tickers:
print(f'Downloading data for {ticker}...')
for timeframe in timeframes:
download_path = f'{modpath}/{timeframe}'
filepath = f'{download_path}/{ticker}.csv'
# NOTE:
# To ensure optimal API response speed, the trailing 2 years of intraday data is evenly divided into 24 "slices" - year1month1, year1month2,
# year1month3, ..., year1month11, year1month12, year2month1, year2month2, year2month3, ..., year2month11, year2month12.
# Each slice is a 30-day window, with year1month1 being the most recent and year2month12 being the farthest from today.
# By default, slice=year1month1
if Path(filepath).is_file(): # if the file already exists
# download the previous to last month
slices = ['year1month2']
download_previous_data(filepath, ticker, timeframe, slices)
else: # if the file doesn't exist
# download the two previous years
#slices = ['year2month12', 'year2month11', 'year2month10', 'year2month9', 'year2month8', 'year2month7', 'year2month6', 'year2month5', 'year2month4', 'year2month3', 'year2month2', 'year2month1', 'year1month12', 'year1month11', 'year1month10', 'year1month9', 'year1month8', 'year1month7', 'year1month6', 'year1month5', 'year1month4', 'year1month3', 'year1month2']
slices = ['year1month2']
download_previous_data(filepath, ticker, timeframe, slices)
if __name__ == '__main__':
main()发布于 2020-11-27 17:13:37
你的问题太多了!这些都是你可以尝试的建议,但我无法检验它们的有效性:
。
https://stackoverflow.com/questions/65040925
复制相似问题