我只需要一点帮助,在我的python脚本中找到一个元素“美丽的汤”。
以下是html:
<div class="lg-7 lg-offset-1 md-24 sm-24 cols">
<div class="row pr__prices">
<div class="lg-24 md-12 cols">
<input id="analytics_prodPrice_47187" type="hidden" value="2.91">
<div class="pr__pricepoint">
<div id="product_price" class="pr__price">
<span class="pound">
£</span>3<span class="pence">.49<span class="incvat">INC VAT</span></span>
<span class="price__extra">(<span id="unit_price">£11.26</span>/<span id="unit_price_measure">Ltr</span>)</span>
</div>
</div>我要做的是获取产品价格,然后查看上面的html,它看起来就像在本节中从上面的html中找到的(价格是GB3.49):
<div id="product_price" class="pr__price">
<span class="pound">
£</span>3<span class="pence">.49<span class="incvat">INC VAT</span></span>
<span class="price__extra">(<span id="unit_price">£11.26</span>/<span id="unit_price_measure">Ltr</span>)</span>
</div>我的问题是,即使我用美丽的汤试着得到这样的价格:
pound = soup.find('span',attrs={'class':'pound'})
pence = soup.find('span',attrs={'class':'pence'})
prices.append(pound.text + pence.text)我得到这个例外说:
prices.append(pound.text + pence.text)
AttributeError: 'NoneType' object has no attribute 'text'因此,在我看来,它似乎是返回一个0或空。有谁知道我怎样才能找到这个元素吗?
编辑
看看下面的答案,我试着复制它们,但是我没有使用静态HTML,而是调用了网站url。我注意到的是,即使代码适用于静态html,但当我调用包含该html的页面的url时,它仍然不能工作。
代码:
from bs4 import BeautifulSoup
import pandas as pd
import requests
data = requests.get('https://www.screwfix.com/p/no-nonsense-sanitary-silicone-white-310ml/47187').text
soup = BeautifulSoup(data, 'html.parser')
currency = soup.select_one('span.pound')
currency_str = next(currency.strings).strip()
pound_str = currency.nextSibling
pence = soup.select_one('span.pence')
pence_str = next(pence.strings).strip()
print(f"{currency_str}{pound_str}{pence_str}") # £3.49错误:
currency_str = next(currency.strings).strip()
AttributeError: 'NoneType' object has no attribute 'strings'发布于 2021-05-26 12:49:31
这是另一种方法。
from bs4 import BeautifulSoup
data = '''\
<div class="lg-7 lg-offset-1 md-24 sm-24 cols">
<div class="row pr__prices">
<div class="lg-24 md-12 cols">
<input id="analytics_prodPrice_47187" type="hidden" value="2.91">
<div class="pr__pricepoint">
<div id="product_price" class="pr__price">
<span class="pound">
£</span>3<span class="pence">.49<span class="incvat">INC VAT</span></span>
<span class="price__extra">(<span id="unit_price">£11.26</span>/<span id="unit_price_measure">Ltr</span>)</span>
</div>
</div>
'''
soup = BeautifulSoup(data, 'html.parser')
currency = soup.select_one('span.pound')
currency_str = next(currency.strings).strip()
pound_str = currency.nextSibling
pence = soup.select_one('span.pence')
pence_str = next(pence.strings).strip()
print(f"{currency_str}{pound_str}{pence_str}") # £3.49发布于 2021-05-26 09:46:45
我已经将您的数据作为html,那么您可以采用什么样的方法来获取该div中的文本并使用strip删除不必要的数据,如果您现在看到main_div包含一些字母,那么使用re删除它,最终得到您想要的输出。
from bs4 import BeautifulSoup
import re
soup=BeautifulSoup(html,"html.parser")
main_div=soup.find("div",attrs={"class":"pr__price"}).get_text(strip=True)
lst=re.findall("\d+", main_div)
print(".".join(lst[:2]))输出:
3.49https://stackoverflow.com/questions/67702079
复制相似问题