我正在尝试获取span标记内的所有文本。但是我得到的不是2个元素,而是4个元素。
<div class="col-sm-6 col-md-7">
<ul>
<li>
<span style="font-family: Verdana, sans-serif; font-size: 10pt;" class="text-black">
Minimum 2 years of experience developing mobile/web applications using
<b>Ionic-3, Ionic-4, AngularJS, Angular.</b>
<p></p>
</span>
</li>
<li>
<span style="font-family: Verdana, sans-serif; font-size: 10pt;" class="text-black">
Experience with Agile
<b>(SCRUM, Kanban)</b>
<p></p>
</span>
</li>
</ul>
</div>我解析HTML的简陋代码是
response.xpath(".//div[@class='col-sm-6 col-md-7']//ul/li//span//text()")我的预期输出是:
["Minimum 2 years of experience developing mobile/web applications using Ionic-3, Ionic-4, AngularJS, Angular.","Experience with Agile (SCRUM, Kanban)"]但我得到的是:
["Minimum 2 years of experience developing mobile/web applications using", "Ionic-3, Ionic-4, AngularJS, Angular.","Experience with Agile", "(SCRUM, Kanban)"]发布于 2020-09-06 17:33:01
这是因为文本数据由<b>标签分隔。
在您的情况下,需要执行以下步骤:
data = []
# separately select span tags:
for span_tag in response.xpath(".//div[@class='col-sm-6 col-md-7']//ul/li//span"):
# for each span tag add it's text as single string:
data.append("".join(span_tag.xpath("//text()").extract()))https://stackoverflow.com/questions/63762450
复制相似问题