我正在抓取一个页面,但在尝试抓取WANTED-DATA时遇到错误
<td class="class-1" data-reactid="41"><a class="class-2" data-reactid="42" data-symbol="MORE-DATA" href="/quote/HKxlkPH4-x" title="WANTED-DATA">text</a></td>我可以通过执行以下操作来提取text:
getText.find('a', attrs={'class':'class-2'}).text
# output: 'text'如何抓取'WANTED-DATA'?
发布于 2020-11-17 16:13:53
试试这个:
links = soup.findAll('a', attrs={'class':'class-2'}).text
for link in links:
title = link.get('title')发布于 2020-11-17 16:05:27
来自docs的。您可以编写tag[attr_name]来获取单个属性,而编写tag.attrs来获取包含所有属性和值的字典。
soup.find('a', attrs={'class':'class-2'})['title']发布于 2020-11-17 22:31:39
你也可以这样做:
html = """<td class="class-1" data-reactid="41"><a class="class-2" data-reactid="42" data-symbol="MORE-DATA" href="/quote/HKxlkPH4-x" title="WANTED-DATA">text</a></td>"""
soup = BeautifulSoup(html)
## adding title=True below prevent any error in case you have links without the 'title attribute'
titles = [x.get('title') for x in soup.find_all('a',title=True)]
print(titles)输出:
['WANTED-DATA']https://stackoverflow.com/questions/64871248
复制相似问题