问刮刮patents.google失败
EN

Stack Overflow用户

提问于 2021-07-22 08:52:06

回答 1查看 38关注 0票数 0

我试图刮除本页的主要标题：https://patents.google.com/patent/CN102093389B/en (“双链氧桥杂环阿纳巴辛化合物及其制备方法”)与刮除，这是不可能的。我正在尝试用css提取它。同样的css选择器在木偶师工作很好，并提取主标题，但在刮除没有任何。代码是这样写的

import scrapy

class GooglepatentsspiderSpider(scrapy.Spider):
    name = 'googlePatentsSpider'
    allowed_domains = ['patents.google.com']
    start_urls = ['https://patents.google.com/patent/CN102093389B/en']

    def parse(self, response):
        title = response.css('h1#title::text').get()

        yield {
            'title': title
        }

python

scrapy

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-07-22 09:52:03

css路径不正确。试试这个，response.css('span[itemprop="title"]::text').get()

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68481903

复制

相似问题

问刮刮patents.google失败
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问刮刮patents.google失败EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问刮刮patents.google失败
EN