文章/答案/技术大牛

发布

社区首页 >问答首页 >使用scrapy循环已发现的a-href url链接来刮取相应的页。

问使用scrapy循环已发现的a-href url链接来刮取相应的页。
EN

Stack Overflow用户

提问于 2019-05-28 23:01:08

回答 1查看 513关注 0票数 0

我刚开始学习Scrapy和Python，并且一直在学习这个教程，但是我被困住了。我已经能够使用shell从页面中获取链接列表，如下所示：

>>> response.css('li').xpath('a/@href').getall()

给我：

'/shop-online/542/fragrances', '/shop-online/81/vitamins', '/shop-online/257/beauty', '/shop-online/665/skin-care', '/shop-online/648/cosmetics', '/shop-online/517/weight-loss', '/shop-online/20/baby-care', '/shop-online/89/sexual-health', '/shop-online/198/smoking-deterrents', '/shop-online/3240/clearance', '/prescriptions', '/shop-online/258/medicines', '/shop-online/1093/cold-flu', '/shop-online/PS-1755/all-fish-oil-supplements', '/shop-online/159/oral-hygiene-and-dental-care', '/shop-online/792/household', '/shop-online/129/hair-care', '/shop-online/1255/sports-nutrition', '/bestsellers', '/categories', 'https://www.chemistwarehouse.hk', '/', '#', '/login', '/youraccount', '#', '/aboutus', '/aboutus/shipping', '/shop-online/542/fragrances', '/shop-online/81/vitamins', '/shop-online/257/beauty', '/shop-online/665/skin-care', '/shop-online/648/cosmetics', '/shop-online/517/weight-loss', '/shop-online/20/baby-care', '/shop-online/89/sexual-health', '/shop-online/198/smoking-deterrents', '/prescriptions', '/shop-online/258/medicines', '/shop-online/1093/cold-flu', '/shop-online/PS-1755/all-fish-oil-supplements', '/shop-online/159/oral-hygiene-and-dental-care', '/shop-online/792/household', '/shop-online/129/hair-care', '/shop-online/1255/sports-nutrition', '/bestsellers']

我想要做的是，至少现在使用shell (然后编写脚本)能够解析出任何不包含在线商店的链接，然后刮取相应的URL，这将是www.网站/--我刮过的链接

但我不知道该怎么做。我知道您可以使用regex表达式，但我不确定如何应用它们，即使可以，我也不知道如何告诉scrapy遍历我发现的内容并刮掉这些页面？

scrapy

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-05-29 08:42:00

我想要…解析所有不包含在线商店的链接，然后刮取相应的URL。

在蜘蛛回调中，这将是：

for link in response.xpath('//li//a/@href[contains(., "/shop-online/")]'):
    yield response.follow(link.get())

在shell中，一次只能处理一个请求，因为它仅用于调试目的，因此您只需选择其中一个链接并获取它：

link = response.xpath('//li//a/@href[contains(., "/shop-online/")]').get()  # Gets the first link only
fetch(response.follow(link))

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56351064

复制

相似问题

问使用scrapy循环已发现的a-href url链接来刮取相应的页。
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用scrapy循环已发现的a-href url链接来刮取相应的页。EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用scrapy循环已发现的a-href url链接来刮取相应的页。
EN