我正在使用股票代码列表从Morningstar网站上抓取一些财务数据,在第12个代码上,我将得到一个丢失的页面返回。页面确实存在,即使我将不同的工作符号切换到第12个插槽中,也会丢失相同的页面。我想这是由Morningstar完成的。我尝试向请求添加时间延迟,但这不起作用。
导致此问题的原因是什么?
for symbol in symbols:
url_morningstar = 'https://www.morningstar.com/funds/xnas/{}/quote'
response = requests.get(url_morningstar.format(symbol))
mySoup = BeautifulSoup(response.text, 'html.parser')
htmlData = mySoup.findAll('span',{'class':'mdc-data-point mdc-data-point--number'})
while(len(htmlData) == 0):
print(symbol, ' ---ERROR---')
print(htmlData)
#print(response.text)
response = requests.get(url_morningstar.format(symbol))
mySoup = BeautifulSoup(response.text, 'html.parser')
htmlData = mySoup.findAll('span',{'class':'mdc-data-point mdc-data-point--number'})
duration = htmlData[-1].text.strip()
nav = htmlData[0].text.strip()我尝试使用循环不断地重试丢失页面的符号,但这没有帮助。
编辑:
完整代码w个符号
symbols = []
with open('symbols.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
symbols.append(row[0])
full_data = []
for symbol in symbols:
print(symbol)
url_schwab = 'https://www.schwab.wallst.com/Prospect/Research/MutualFunds/Summary.asp?symbol={}'
url_morningstar = 'https://www.morningstar.com/funds/xnas/{}/quote'
response = requests.get(url_schwab.format(symbol))
mySoup = BeautifulSoup(response.text, 'html.parser')
table = mySoup.find('div',{'id':'detailsWrapper'})
rows = table.findAll('table',{'class':'tableType1'})
headers = []
output = []
schwab_dict = {}
for row in rows:
cols = row.find('tbody').find('tr').findAll('td')
colNames = row.find('tbody').find('tr').findAll('th')
colNames = [ele.text.strip() for ele in colNames]
cols = [ele.text.strip() for ele in cols]
output.append([ele for ele in cols if ele])
headers.append([ele for ele in colNames if ele])
#[['52 Week Range'], ['YTD Return'], ['Gross Expense Ratio'], ['Net Expense Ratio'], ['Tax-Equivalent Yield'], ['30-Day SEC Yield'], ["Distribution Yield], ['Most Recent Distribution'], ['Availability'], ['Manager Tenure']]
#[['$9.92 - $10.01'], ['0.91%as of 09/02/2021'], ['0.68%'], ['0.68%'], ['--'], ['1.42%'], ['1.65%'], ['$0.0118'], ['Open'], ['2011']]
headers[1] = ['YTD Return']
headers[6] = ['Distribution Yield']
for i in range(len(headers)):
schwab_dict[headers[i][0]] = output[i][0]
response = requests.get(url_morningstar.format(symbol))
mySoup = BeautifulSoup(response.text, 'html.parser')
htmlData = mySoup.findAll('span',{'class':'mdc-data-point mdc-data-point--number'})
while(len(htmlData) == 0):
print(symbol, ' ---ERROR---')
print(htmlData)
#print(response.text)
response = requests.get(url_morningstar.format(symbol))
mySoup = BeautifulSoup(response.text, 'html.parser')
htmlData = mySoup.findAll('span',{'class':'mdc-data-point mdc-data-point--number'})
duration = htmlData[-1].text.strip()
nav = htmlData[0].text.strip()
# extract Duration EXP ratio YTD 2021 SEC Yield Price Last Updated
results = [duration, schwab_dict['Net Expense Ratio'], schwab_dict['YTD Return'], schwab_dict['30-Day SEC Yield'], nav ]
full_data.append([results])
with open('scrappedData.csv', 'x') as csvfile:
writer = csv.reader(csvfile)
writer.writerow(full_data)Symbols.csv
DLSNX
FFRHX
MWLDX
OSTIX
PRWBX
VBIRX
VSGBX
VFSTX
VFISX
FSTFX
PRFSX
VMLTX
VWSTX
DODIX
DLTNX
FAGIX
SPHIX
FTHRX
FBNDX
FADMX
FTBFX
LSBRX
MWTRX
RPSIX
VFIIX
VWEHX
VBILX
VFICX
VFITX
VBTLX
FLTMX
PRSMX
VCAIX
VWITX
PRPIX
PRULX
VIPSX
VBLAX
VWESX
VUSTX
FCTFX
FHIGX
FTFMX
FTABX
PRINX
PRFHX
PRTAX
VCITX
VWAHX
VWLTX
LSGLX
RPIBX
VTABX
FCVSX
VWINX发布于 2021-09-07 11:56:27
试着把这一行放在你阅读代码的符号下面:
symbols = []
with open('symbols.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
symbols.append(row[0])
symbols = [s.strip() for s in symbols]我想其中一个有额外的空间。
https://stackoverflow.com/questions/69087055
复制相似问题