问从爬虫迁移到CrawlSpider
EN

Stack Overflow用户

提问于 2021-08-08 12:47:27

回答 1查看 36关注 0票数 0

我试图从一般的爬行器转移到CrawlSpider，以利用规则。然而，我的爬虫不再那样工作了。你知道我做错了什么吗？

之前：

class GitHubSpider(scrapy.Spider):
    name = "github"
    start_urls = [
        "https://github.com/search?p=1&q=React+Django&type=Users",
    ]

    def parse(self, response):
        engineer_links = response.css("a.mr-1::attr(href)")
        yield from response.follow_all(engineer_links, self.parse_engineer)

        pagination_links = response.css(".next_page::attr(href)")
        yield from response.follow_all(pagination_links, self.parse)

    def parse_engineer(self, response):
        yield {
            "username": response.css(".vcard-username::text").get().strip(),
        }

新建(不起作用)：

class GitHubSpider(CrawlSpider):
    name = "github"
    start_urls = [
        "https://github.com/search?p=1&q=React+Django&type=Users",
    ]

    rules = (
        Rule(
            LinkExtractor(restrict_css=("a.mr-1::attr(href)")),
            callback="parse_engineer",
        ),
        Rule(LinkExtractor(restrict_css=(".next_page::attr(href)"))),
    )

    def parse_engineer(self, response):
        yield {
            "username": response.css(".vcard-username::text").get().strip(),
        }

scrapy

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-08-08 13:24:00

现在，它起作用了：

import scrapy
from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class GitHubSpider(CrawlSpider):
    name = "github"
    allowed_domains = [github.com]
    start_urls = [
        "https://github.com/search?p=1&q=React+Django&type=Users"
    ]

    rules = (
        Rule(LinkExtractor(restrict_css="a.mr-1"),callback="parse_engineer",),
        Rule(LinkExtractor(restrict_css=".next_page")),
    )

    def parse_engineer(self, response):
        yield {
            "username": response.css(".vcard-username::text").get().strip()
        }

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68700937

复制

相似问题

问从爬虫迁移到CrawlSpider
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从爬虫迁移到CrawlSpiderEN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从爬虫迁移到CrawlSpider
EN