我不知道如何正确地从下面的代码块中提取href,特别是ID (hillge01,masonfr01)。
<div>
<strong>Inactive: </strong>
<span><strong>MIL</strong></span>
<a href="/players/h/hillge01.html">George Hill</a>,
<a href="/players/m/masonfr01.html">Frank Mason</a>,
<a href="/players/r/reynoca01.html">Cameron Reynolds</a>,
<a href="/players/w/wilsodj01.html">D.J. Wilson</a>
<span><strong>LAL</strong> </span>
<a href="/players/a/antetko01.html">Kostas Antetokounmpo</a>,
<a href="/players/c/cacokde01.html">Devontae Cacok</a>,
<a href="/players/h/hortota01.html">Talen Horton-Tucker</a>,
<a href="/players/w/waitedi01.html">Dion Waiters</a>
</div>到目前为止,我已经成功地使用下面的代码提取了第一个href,但还没有想出一种方法来返回其余的href。
soup = get_soup(date_team)
for strong_tag in soup.findAll('strong'):
if 'Inactive' in strong_tag.text:
str1 = strong_tag.next_sibling.next_sibling
print(str1)在这方面的任何帮助都将非常感谢。
发布于 2020-03-08 12:59:22
尝尝这个。使用SimplifiedDoc的解决方案。
from simplified_scrapy import SimplifiedDoc
html = '''
<div>
<strong>Inactive: </strong>
<span><strong>MIL</strong> </span>
<a href="/players/h/hillge01.html">George Hill</a>,
<a href="/players/m/masonfr01.html">Frank Mason</a>,
<a href="/players/r/reynoca01.html">Cameron Reynolds</a>,
<a href="/players/w/wilsodj01.html">D.J. Wilson</a>
<span><strong>LAL</strong> </span>
<a href="/players/a/antetko01.html">Kostas Antetokounmpo</a>,
<a href="/players/c/cacokde01.html">Devontae Cacok</a>,
<a href="/players/h/hortota01.html">Talen Horton-Tucker</a>,
<a href="/players/w/waitedi01.html">Dion Waiters</a>
</div>
'''
doc = SimplifiedDoc(html)
strong = doc.getElementByText('Inactive',tag='strong')
next = strong.getNext('a')
print(next)
next = next.next
print(next)结果:
{'href': '/players/h/hillge01.html', 'tag': 'a', 'html': 'George Hill'}
{'href': '/players/m/masonfr01.html', 'tag': 'a', 'html': 'Frank Mason'}https://stackoverflow.com/questions/60582757
复制相似问题