我的问题是对一个被问到的这里的后续问题.
职能:
periodic_figure_values()
除了搜索到的行项的名称出现两次外,其他情况似乎都很好。我所指的具体情况是试图获得“长期债务”的数据。上面链接中的函数将返回以下错误:
Traceback (most recent call last):
File "test.py", line 31, in <module>
LongTermDebt=(periodic_figure_values(soup, "Long Term Debt"))
File "test.py", line 21, in periodic_figure_values
value = int(str_value)
ValueError: invalid literal for int() with base 10: 'Short/Current Long Term Debt'因为它似乎被“短期/当前长期债务”绊倒了。你看,这一页既有“短期/当前长期债务”,也有“长期债务”。您可以看到一个使用苹果资产负债表这里的源页面示例。
我试图为函数找到一种方法来返回“长期债务”的数据,而不会被“短期/当前长期债务”绊倒。
以下是获取“现金和现金等价物”的函数和一个示例,它运行良好,而“长期债务”则不起作用:
import requests, bs4, re
def periodic_figure_values(soup, yahoo_figure):
values = []
pattern = re.compile(yahoo_figure)
title = soup.find("strong", text=pattern) # works for the figures printed in bold
if title:
row = title.parent.parent
else:
title = soup.find("td", text=pattern) # works for any other available figure
if title:
row = title.parent
else:
sys.exit("Invalid figure '" + yahoo_figure + "' passed.")
cells = row.find_all("td")[1:] # exclude the <td> with figure name
for cell in cells:
if cell.text.strip() != yahoo_figure: # needed because some figures are indented
str_value = cell.text.strip().replace(",", "").replace("(", "-").replace(")", "")
if str_value == "-":
str_value = 0
value = int(str_value)
values.append(value)
return values
res = requests.get('https://ca.finance.yahoo.com/q/bs?s=AAPL')
res.raise_for_status
soup = bs4.BeautifulSoup(res.text, 'html.parser')
Cash=(periodic_figure_values(soup, "Cash And Cash Equivalents"))
print(Cash)
LongTermDebt=(periodic_figure_values(soup, "Long Term Debt"))
print(LongTermDebt)发布于 2016-07-27 05:52:53
最简单的方法是使用try/except组合,使用引发的ValueError
import requests, bs4, re
def periodic_figure_values(soup, yahoo_figure):
values = []
pattern = re.compile(yahoo_figure)
title = soup.find("strong", text=pattern) # works for the figures printed in bold
if title:
row = title.parent.parent
else:
title = soup.find("td", text=pattern) # works for any other available figure
if title:
row = title.parent
else:
sys.exit("Invalid figure '" + yahoo_figure + "' passed.")
cells = row.find_all("td")[1:] # exclude the <td> with figure name
for cell in cells:
if cell.text.strip() != yahoo_figure: # needed because some figures are indented
str_value = cell.text.strip().replace(",", "").replace("(", "-").replace(")", "")
if str_value == "-":
str_value = 0
### from here
try:
value = int(str_value)
values.append(value)
except ValueError:
continue
### to here
return values
res = requests.get('https://ca.finance.yahoo.com/q/bs?s=AAPL')
res.raise_for_status
soup = bs4.BeautifulSoup(res.text, 'html.parser')
Cash=(periodic_figure_values(soup, "Cash And Cash Equivalents"))
print(Cash)
LongTermDebt=(periodic_figure_values(soup, "Long Term Debt"))
print(LongTermDebt)这张把你的号码打印得很好。
请注意,在这种情况下,您实际上并不需要re模块,因为您只检查文字(没有通配符,没有边界)等等。
发布于 2016-07-27 05:52:24
您可以更改该函数,以便它接受正则表达式而不是普通字符串。然后,您可以搜索^Long Term Debt,以确保在此之前没有文本。你要做的就是改变
if cell.text.strip() != yahoo_figure:至
if not re.match(yahoo_figure, cell.text.strip()):https://stackoverflow.com/questions/38604506
复制相似问题