我正在使用下面的代码从网页上抓取汽车的名称、地址和数量。
然而,通常情况下,汽车的数量为空值。让我们在下面的示例中假设第8经销商返回的汽车数为空,因此返回的列表如下:
名称= a,b,c,d,e,f,g,h,i,j
地址= aa,bb,cc,dd,ee,ff,gg,hh,ii,jj
汽车=1,2,3,4,5,6,7,9
如果经销商a在地址aa有1辆汽车,经销商b在地址bb有2辆汽车等等,但由于地址hh的经销商h有空值的汽车,因此代码认为经销商h有9辆汽车,因此经销商I和地址ii有10辆汽车,因此地址jj的经销商j被遗漏,因为汽车列表已经用完。
因此,如果代码返回cars的空值,如何将其替换为0?因此,在上面的例子中,经销商h和地址hh将有0辆汽车,因此,在地址ii处的经销商i有9辆,在地址jj的经销商j有10辆车。
import scrapy
from autotrader.items import AutotraderItem
class AutotraderSpider(scrapy.Spider):
name = "autotrader"
allowed_domains = ["autotrader.co.uk"]
start_urls = ["https://www.autotrader.co.uk/car-dealers/search?advertising-location=at_cars&postcode=m43aq&radius=1500&forSale=on&toOrder=on&sort=with-retailer-reviews&page=822"]
def parse(self, response):
for sel in response.xpath('//ul[@class="dealerList__container"]'):
names = sel.xpath('.//*[@itemprop="legalName"]/text() ').extract()
names = [name.strip() for name in names]
addresses = sel.xpath('.//li/article/a/div/p[@itemprop="address"]/text()').extract()
addresses = [address.strip() for address in addresses]
carss = sel.xpath('.//li/article/a/div/p[@class="dealerList__itemCount"]/span/text()').extract()
carss = [cars.strip() for cars in carss]
result = zip(names, addresses, carss)
for name, address, cars in result:
item = AutotraderItem()
item['name'] = name
item['address'] = address
item['cars'] = cars
yield item发布于 2018-05-03 04:58:30
你的选择器循环有点混乱。
这里循环遍历未排序的列表,其中每个年龄只有一个:
for sel in response.xpath('//ul[@class="dealerList__container"]'):您希望循环遍历所有列表项:
for sel in response.xpath('//li[@class="dealerList__itemContainer"]'):如果以这种方式循环,您可以获得每个单独列表项的名称、地址:
for sel in response.xpath('//li[@class="dealerList__itemContainer"]'):
names = sel.xpath('.//*[@itemprop="legalName"]/text() ').extract()
names = [name.strip() for name in names]
addresses = sel.xpath('.//article/a/div/p[@itemprop="address"]/text()').extract()
addresses = [address.strip() for address in addresses]
carss = sel.xpath('.//article/a/div/p[@class="dealerList__itemCount"]/span/text()').extract()
carss = [cars.strip() for cars in carss]
item = AutotraderItem()
item['name'] = name
item['address'] = address
item['cars'] = cars
yield item发布于 2018-05-03 07:50:21
试试这个来得到结果。您可以在您的刮刮项目中使用xpaths,如下所示:
class AutotraderSpider(scrapy.Spider):
name = "autotrader"
allowed_domains = ["autotrader.co.uk"]
start_urls = ["https://www.autotrader.co.uk/car-dealers/search?advertising-location=at_cars&postcode=m43aq&radius=1500&forSale=on&toOrder=on&sort=with-retailer-reviews&page=822"]
def parse(self, response):
for items in response.xpath("//article[@class='dealerList__item']"):
name = items.xpath(".//span[@itemprop='legalName']/text()").extract_first()
address = ' '.join([' '.join(item.split()) for item in items.xpath(".//p[@class='dealerList__itemAddress']/text()").extract()])
cars = items.xpath(".//span[@class='dealerList__itemCountNumber']/text()").extract_first()
yield {"Name":name,"Address":address,"Cars":cars}部分产出:
Midland Motors Leicester Street, Burton-On-Trent, Staffordshire DE14 3BA 2
Ns Cars 69 Eldon Street, Burton-On-Trent, Staffordshire DE15 0LT 1
RS Sales Nottingham Ltd Unit 1 TRINITY PARK, RANDALL PARK WAY, Retford, Nottinghamshire DN22 7WF 1
Adc Ltd Unit 3 HUCKNALL LANE, Nottingham, Nottinghamshire NG6 8AJ 5https://stackoverflow.com/questions/50144983
复制相似问题