因此,基本上,我想从这个html中提取tr-mfgPartNumber类下的部分,但是有问题。
。
导入scrapy类DigiSpider(scrapy.Spider):name = 'digi‘allowed_domains = 'digikey.com’start_urls =digikey.com def解析(self,响应: data={} parts=response.css('tbody.InkPart'),用于部件:p in part.css('td.tr-mfgPartNumber'):data'href‘=p.css(’a:attr(Href)‘).extract().extract()
下面的是
<tbody id="lnkPart" cookie-tracking="ref_page_event=Select Part;available_parameters=["s","pv1989","pv142","pv2042","pv2192","pv276","pv252","pv16","pv1291"];">
<tr>
<td class="tr-compareParts" align="center">
<input type="checkbox" name="part" value="428-3574-2-ND" id="428-3574-2-ND" onclick="partClick();">
<label title="Compare Parts" for="428-3574-2-ND"></label>
</td>
<td class="tr-datasheet">
<a class="lnkDatasheet" href="https://www.cypress.com/file/43021/download" target="_blank" track-data="ref_page_event=Display Asset;page_title=Datasheet;asset_type=Datasheet">
<img class="datasheet-img" src="//www.digikey.com/Web%20Export/Common/icons/datasheet.png" alt="CY62157EV30LL-45ZSXIT Datasheet" title="CY62157EV30LL-45ZSXIT Datasheet">
</a>
</td>
<td class="tr-image">
<a href="/product-detail/en/cypress-semiconductor-corp/CY62157EV30LL-45ZSXIT/428-3574-2-ND/1205268">
<img class="pszoomer" zoomimg="//media.digikey.com/Renders/Cypress%20Semi%20Renders/428;51-85087;Z,ZS;44.jpg" border="0" height="64" src="//media.digikey.com/Renders/Cypress%20Semi%20Renders/428;51-85087;Z,ZS;44_tmb.jpg" alt="CY62157EV30LL-45ZSXIT - Cypress Semiconductor Corp" title="CY62157EV30LL-45ZSXIT - Cypress Semiconductor Corp">
</a>
</td>
<td class="tr-dkPartNumber nowrap-culture">
<a href="/product-detail/en/cypress-semiconductor-corp/CY62157EV30LL-45ZSXIT/428-3574-2-ND/1205268">
428-3574-2-ND
</a>
<div class="product-indicator-collection">
<a class="align-indicator-collection" href="javascript:msgBox('#dlgRohs');">
<img class="rohs-foilage" src="//www.digikey.com/web%20export/common/mkt/en/leaf.png" border="0" alt="This part is RoHS compliant." title="This part is RoHS compliant.">
</a>
</div>
</td>
<td class="tr-mfgPartNumber">
<a href="/product-detail/en/cypress-semiconductor-corp/CY62157EV30LL-45ZSXIT/428-3574-2-ND/1205268">
<span>CY62157EV30LL-45ZSXIT</span>
</a>
</td>发布于 2020-07-08 19:40:11
当我尝试相同的代码时,scrapy得到的是空响应。可能是这个网站发现并挡住了蜘蛛。在使用用户代理之后,它起了作用。
下面的代码(我还将"tbody.InkPart“更改为"tbody#lnkPart",这是代码中的语法错误,尽管不需要它,因为只有一个tbody标记):
import scrapy
class DigiSpider(scrapy.Spider):
name = 'digi'
allowed_domains = ['digikey.com']
custom_settings = {
"USER_AGENT": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
}
start_urls = ['https://www.digikey.com/products/en/integrated-circuits-ics/memory/774?FV=-1%7C428%2C-8%7C774%2C7%7C1/']
def parse(self, response):
data={}
parts=response.css('tbody#lnkPart')
for part in parts:
for p in part.css('td.tr-mfgPartNumber'):
data['href'] = p.css('a::attr(href)').extract()
yield data https://stackoverflow.com/questions/62800891
复制相似问题