我有一些类似于下面的html代码,我正在尝试从其中解析/提取一些内容。
<div class="row d-3">
<div class="col-16 col-sm-8">
<strong>Category</strong> <br>
// *** extract this text ***
Clothing</div>
<div class="col-16 col-sm-8">
<strong>Sub-category</strong> <br>
// *** extract this text ***
this is Sub-category
</div>
<div class="col-16 col-sm-8">
<strong>product</strong> <br>
// *** extract this text ***
This is the actual product </div>
</div>我需要以下几点:
{类别:服装,子类别:这是子类别,产品:这是实际产品}。
我尝试了以下几点:
for b in soup.find_all("div", class_="row d-3"):
print(b.strong.get_text())但我只能提取Category,而不能提取Clothing。
发布于 2022-01-09 10:12:25
如何实现?
您可以使用contents或在下面的stripped_strings解决方案中使用生成器
list(b.stripped_strings)
#Output --> ['Category', 'Clothing', 'Sub-category', 'this is Sub-category', 'product', 'This is the actual product']要在dict中转换此结果集,可以使用:
dict({x for x in zip(s[::2],s[1::2])})例如:
html = '''
<div class="row d-3">
<div class="col-16 col-sm-8">
<strong>Category</strong> <br>
Clothing</div>
<div class="col-16 col-sm-8">
<strong>Sub-category</strong> <br>
this is Sub-category
</div>
<div class="col-16 col-sm-8">
<strong>product</strong> <br>
This is the actual product </div>
</div>'''
soup = BeautifulSoup(html, "lxml")
for b in soup.find_all("div", class_="row d-3"):
s = list(b.stripped_strings)
print(dict({x for x in zip(s[::2],s[1::2])}))输出:
{'Category': 'Clothing', 'Sub-category': 'this is Sub-category', 'product': 'This is the actual product'}https://stackoverflow.com/questions/70637211
复制相似问题