HTML:
<div class="col-7">
<dl class="row box">
<h2>GENERAL</h2>
<dt class="col-6">transmission:</dt>
<dd class="col-6">sequential automatic</dd>
<dt class="col-6 grey">number of seats:</dt>
<dd class="col-6">5</dd>
<dt class="col-6">first year of production:</dt>
<dd class="col-6">2017</dd>
<dt class="col-6 grey">last year of production:</dt>
<dd class="col-6">available</dd>
</dl>
<dl class="row box">
<h2>DRIVE</h2>
<dt class="col-6">fuel:</dt>
<dd class="col-6">petrol</dd>
<dt class="col-6 grey">total maximum power:</dt>
<dd class="col-6">147 kW (200 hp)</dd>
<dt class="col-6">total maximum torque:</dt>
<dd class="col-6">330 Nm</dd>
</dl>
<dl class="row box">
<h2>TRANSMISSION</h2>
<dt class="col-6">1st gear:</dt>
<dd class="col-6">5,00:1</dd>
<dt class="col-6 grey">2nd gear:</dt>
<dd class="col-6">3,20:1</dd>
</dl>
</div>我的代码:
for item2 in soup2.find_all(attrs={'class':'col-7'}):
jj=item2.textjj可以从我抓取的网站中提取所有的值,但我只需要其中的几个值。例如,我只需要从一般情况下提取座位数和去年产量的值,以及从变速箱中提取第一档的值。
结果应该是:
5, available, 5,00:1发布于 2018-07-30 10:10:51
您需要的信息只是标题“座位数”、“去年的产量”和“第一档”的下一项,因此您可以使用zip遍历该项和下一项
all_items = soup.find_all(attrs={'class':'col-6'})
titles = [
"number of seats",
"last year of production",
"1st gear"
]
d = {title: [] for title in titles}
for item, next_item in zip(all_items, all_items[1:]):
for title in titles:
if title in item.text:
d[title].append(next_item.text)
break然后,d将包含您需要的所有信息
https://stackoverflow.com/questions/51586003
复制相似问题