我正试图从雅虎金融公司获得有关某只股票的所有历史信息。我对python和web抓取很陌生。
我想下载所有的历史数据到一个CSV文件。问题是,代码只下载网站上任何股票的前100条条目。当在浏览器上查看任何股票时,我们必须滚动到页面底部,以便加载更多的表条目。
我认为,当我使用库下载时,也会发生同样的事情。某种优化似乎正在阻止网页完全下载。在这里试试(https://in.finance.yahoo.com/quote/TVSMOTOR.NS/history?period1=-19800&period2=1524236374&interval=1d&filter=history&frequency=1d)。有什么办法可以克服这个问题吗?
这是我的密码:
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url= 'https://in.finance.yahoo.com/quote/TVSMOTOR.NS/history?period1=-19800&period2=1524236374&interval=1d&filter=history&frequency=1d'
page=uReq(my_url)
page_html = page.read()
page_data = soup(page_html,"html.parser")
container= page_data.findAll("table",{"data-test":"historical-prices"})
container= container[0].tbody
rows=container.findAll("tr")
filename="tvs.csv"
f=open(filename,"w")
headers = "date, open, low, close, adjusted_close_price, vol \n"
f.write(headers)
for row in rows:
if len(row.find_all("td",{"colspan":""}))==7 :
col=row.findAll("td")
date=col[0].span.text.strip()
opend=col[1].span.text.strip().replace(",","")
if opend!='null':
high=col[2].span.text.strip().replace(",","")
low=col[3].span.text.strip().replace(",","")
close=col[4].span.text.strip().replace(",","")
adjclose=col[5].span.text.strip().replace(",","")
vol=col[6].span.text.strip().replace(",","")
f.write(date+","+opend+","+low+","+close+","+adjclose+","+vol+","+"\n")
f.close();提前谢谢!
编辑:
好吧,我发现另一段代码很好用。但我不知道它是怎么工作的。任何帮助都将不胜感激。
#!/usr/bin/env python
"""
get-yahoo-quotes.py: Script to download Yahoo historical quotes using the new cookie authenticated site.
Usage: get-yahoo-quotes SYMBOL
History
06-03-2017 : Created script
"""
__author__ = "Brad Luicas"
__copyright__ = "Copyright 2017, Brad Lucas"
__license__ = "MIT"
__version__ = "1.0.0"
__maintainer__ = "Brad Lucas"
__email__ = "brad@beaconhill.com"
__status__ = "Production"
import re
import sys
import time
import datetime
import requests
def split_crumb_store(v):
return v.split(':')[2].strip('"')
def find_crumb_store(lines):
# Looking for
# ,"CrumbStore":{"crumb":"9q.A4D1c.b9
for l in lines:
if re.findall(r'CrumbStore', l):
return l
print("Did not find CrumbStore")
def get_cookie_value(r):
return {'B': r.cookies['B']}
def get_page_data(symbol):
url = "https://finance.yahoo.com/quote/%s/?p=%s" % (symbol, symbol)
r = requests.get(url)
cookie = get_cookie_value(r)
# Code to replace possible \u002F value
# ,"CrumbStore":{"crumb":"FWP\u002F5EFll3U"
# FWP\u002F5EFll3U
lines = r.content.decode('unicode-escape').strip(). replace('}', '\n')
return cookie, lines.split('\n')
def get_cookie_crumb(symbol):
cookie, lines = get_page_data(symbol)
crumb = split_crumb_store(find_crumb_store(lines))
return cookie, crumb
def get_data(symbol, start_date, end_date, cookie, crumb):
filename = '%s.csv' % (symbol)
url = "https://query1.finance.yahoo.com/v7/finance/download/%s?period1=%s&period2=%s&interval=1d&events=history&crumb=%s" % (symbol, start_date, end_date, crumb)
response = requests.get(url, cookies=cookie)
with open (filename, 'wb') as handle:
for block in response.iter_content(1024):
handle.write(block)
def get_now_epoch():
# @see https://www.linuxquestions.org/questions/programming-9/python-datetime-to-epoch-4175520007/#post5244109
return int(time.time())
def download_quotes(symbol):
start_date = 0
end_date = get_now_epoch()
cookie, crumb = get_cookie_crumb(symbol)
get_data(symbol, start_date, end_date, cookie, crumb)
if __name__ == '__main__':
# If we have at least one parameter go ahead and loop overa all the parameters assuming they are symbols
if len(sys.argv) == 1:
print("\nUsage: get-yahoo-quotes.py SYMBOL\n\n")
else:
for i in range(1, len(sys.argv)):
symbol = sys.argv[i]
print("--------------------------------------------------")
print("Downloading %s to %s.csv" % (symbol, symbol))
download_quotes(symbol)
print("--------------------------------------------------")发布于 2018-04-20 17:20:00
最初,只有100个结果被下载到浏览器。当您滚动到页面底部时,会发生JS事件,触发AJAX函数在后台下载下一个50/100数据条目,然后将其显示给浏览器。在您的python代码中,没有可能创建JS事件,因为python不执行javascript,因此AJAX调用请求是不可能的。所以最好使用https://intrinio.com/或https://www.alphavantage.co
您可以尝试雅虎财务python包。https://pypi.org/project/yahoo-finance/
https://stackoverflow.com/questions/49946597
复制相似问题