我有麻烦了。我的目标是在某一时刻解析数据。然后,我想停止解析。
<span itemprop="address">
Some address
</span>
<i class="fa fa-signal">
</i>
...
</p>
</div>
</div>
<div class="search_pagination" id="pagination">
<ul class="pagination">
</ul>
</div>
</div>
</div>
</div>
<div class="col-sm-3">
<div class="panel" itemscope="" itemtype="http://schema.org/WPSideBar">
<h2 class="heading_a" itemprop="name">
Top-10 today
</h2> #a lot of tags after that moment我想从<span itemprop="address">获得所有的值(以前有很多),直到Top-10 today。
发布于 2016-10-14 13:05:50
你可以让BeautifulSoup SoupStrainer
from bs4 import BeautifulSoup, SoupStrainer
only_addresses = SoupStrainer("span", itemprop="address")
soup = BeautifulSoup(html_doc, "html.parser", parse_only=only_addresses)如果您在“今天的前十名”之前有一些“地址”,但是您对前面的“地址”感兴趣,则可以定制搜索函数。
def search_addresses(tag):
return tag.name == "span" and tag.get("itemprop") == "address" and \
tag.find_next("h2", text=lambda text: text and "Top-10 today" in text)
addresses = soup.find_all(search_addresses)这看起来并不简单,但想法很简单-我们使用find_next()对每个“地址”,以检查“今天的前10”标题是否存在。
https://stackoverflow.com/questions/40043715
复制相似问题