文章/答案/技术大牛

发布

社区首页 >问答首页 >BeautifulSoup -如何调用嵌套元素

问BeautifulSoup -如何调用嵌套元素
EN

Stack Overflow用户

提问于 2021-05-26 09:22:59

回答 2查看 58关注 0票数 0

我只需要一点帮助，在我的python脚本中找到一个元素“美丽的汤”。

以下是html：

<div class="lg-7 lg-offset-1 md-24 sm-24 cols">
<div class="row pr__prices">
<div class="lg-24 md-12 cols">
<input id="analytics_prodPrice_47187" type="hidden" value="2.91">
<div class="pr__pricepoint">
   <div id="product_price" class="pr__price">
      <span class="pound">
      £</span>3<span class="pence">.49<span class="incvat">INC VAT</span></span>
      <span class="price__extra">(<span id="unit_price">£11.26</span>/<span id="unit_price_measure">Ltr</span>)</span>
   </div>
</div>

我要做的是获取产品价格，然后查看上面的html，它看起来就像在本节中从上面的html中找到的(价格是GB3.49)：

   <div id="product_price" class="pr__price">
      <span class="pound">
      £</span>3<span class="pence">.49<span class="incvat">INC VAT</span></span>
      <span class="price__extra">(<span id="unit_price">£11.26</span>/<span id="unit_price_measure">Ltr</span>)</span>
   </div>

我的问题是，即使我用美丽的汤试着得到这样的价格：

pound = soup.find('span',attrs={'class':'pound'})
pence = soup.find('span',attrs={'class':'pence'})
prices.append(pound.text + pence.text)

我得到这个例外说：

 prices.append(pound.text + pence.text)
AttributeError: 'NoneType' object has no attribute 'text'

因此，在我看来，它似乎是返回一个0或空。有谁知道我怎样才能找到这个元素吗？

编辑

看看下面的答案，我试着复制它们，但是我没有使用静态HTML，而是调用了网站url。我注意到的是，即使代码适用于静态html，但当我调用包含该html的页面的url时，它仍然不能工作。

代码：

from bs4 import BeautifulSoup
import pandas as pd
import requests

data = requests.get('https://www.screwfix.com/p/no-nonsense-sanitary-silicone-white-310ml/47187').text

soup = BeautifulSoup(data, 'html.parser')
currency = soup.select_one('span.pound')
currency_str = next(currency.strings).strip()

pound_str = currency.nextSibling

pence = soup.select_one('span.pence')
pence_str = next(pence.strings).strip()

print(f"{currency_str}{pound_str}{pence_str}")  # £3.49

错误：

 currency_str = next(currency.strings).strip()
AttributeError: 'NoneType' object has no attribute 'strings'

python

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-05-26 12:49:31

这是另一种方法。

from bs4 import BeautifulSoup

data = '''\
<div class="lg-7 lg-offset-1 md-24 sm-24 cols">
<div class="row pr__prices">
<div class="lg-24 md-12 cols">
<input id="analytics_prodPrice_47187" type="hidden" value="2.91">
<div class="pr__pricepoint">
   <div id="product_price" class="pr__price">
      <span class="pound">
      £</span>3<span class="pence">.49<span class="incvat">INC VAT</span></span>
      <span class="price__extra">(<span id="unit_price">£11.26</span>/<span id="unit_price_measure">Ltr</span>)</span>
   </div>
</div>
'''

soup = BeautifulSoup(data, 'html.parser')
currency = soup.select_one('span.pound')
currency_str = next(currency.strings).strip()

pound_str = currency.nextSibling

pence = soup.select_one('span.pence')
pence_str = next(pence.strings).strip()

print(f"{currency_str}{pound_str}{pence_str}")  # £3.49

票数 0

Stack Overflow用户

发布于 2021-05-26 09:46:45

我已经将您的数据作为html，那么您可以采用什么样的方法来获取该div中的文本并使用strip删除不必要的数据，如果您现在看到main_div包含一些字母，那么使用re删除它，最终得到您想要的输出。

from bs4 import BeautifulSoup
import re

soup=BeautifulSoup(html,"html.parser")
main_div=soup.find("div",attrs={"class":"pr__price"}).get_text(strip=True)

lst=re.findall("\d+", main_div)
print(".".join(lst[:2]))

输出：

3.49

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67702079

复制

相似问题

问BeautifulSoup -如何调用嵌套元素
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup -如何调用嵌套元素EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup -如何调用嵌套元素
EN