文章/答案/技术大牛

发布

社区首页 >问答首页 >Python Scrapy tutorial KeyError：‘找不到爬虫：

问Python Scrapy tutorial KeyError：‘找不到爬虫：
EN

Stack Overflow用户

提问于 2014-10-14 19:21:00

回答 1查看 9.8K关注 0票数 6

我正在尝试编写我的第一个抓取蜘蛛，我一直在http://doc.scrapy.org/en/latest/intro/tutorial.html上的教程，但我得到一个错误"KeyError：‘蜘蛛找不到：“

我想我是从正确的目录(包含scrapy.cfg文件的那个目录)运行该命令的。

(proscraper)#( 10/14/14@ 2:06pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   tree
.
├── scrapy
│   ├── __init__.py
│   ├── items.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders
│       ├── __init__.py
│       └── juno_spider.py
└── scrapy.cfg

2 directories, 7 files
(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   ls
scrapy  scrapy.cfg

下面是我得到的错误

(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   scrapy crawl juno
/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/twisted/internet/_sslverify.py:184: UserWarning: You do not have the service_identity module installed. Please install it from <https://pypi.python.org/pypi/service_identity>. Without the service_identity module and a recent enough pyOpenSSL tosupport it, Twisted can perform only rudimentary TLS client hostnameverification.  Many valid certificate/hostname mappings may be rejected.
  verifyHostname, VerificationError = _selectVerifyImplementation()
Traceback (most recent call last):
  File "/home/tim/.virtualenvs/proscraper/bin/scrapy", line 9, in <module>
    load_entry_point('Scrapy==0.24.4', 'console_scripts', 'scrapy')()
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 143, in execute
    _run_print_help(parser, _run_command, cmd, args, opts)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help
    func(*a, **kw)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/cmdline.py", line 150, in _run_command
    cmd.run(args, opts)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/commands/crawl.py", line 58, in run
    spider = crawler.spiders.create(spname, **opts.spargs)
  File "/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/spidermanager.py", line 44, in create
    raise KeyError("Spider not found: %s" % spider_name)
KeyError: 'Spider not found: juno'

这是我的virtualenv：

(proscraper)#( 10/14/14@ 2:13pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   pip freeze
Scrapy==0.24.4
Twisted==14.0.2
cffi==0.8.6
cryptography==0.6
cssselect==0.9.1
ipdb==0.8
ipython==2.3.0
lxml==3.4.0
pyOpenSSL==0.14
pycparser==2.10
queuelib==1.2.2
six==1.8.0
w3lib==1.10.0
wsgiref==0.1.2
zope.interface==4.1.1

下面是我的爬行器的代码，其中填充了name属性：

(proscraper)#( 10/14/14@ 2:14pm )( tim@localhost ):~/Workspace/Development/hacks/prosum-scraper/scrapy
   cat scrapy/spiders/juno_spider.py 
import scrapy

class JunoSpider(scrapy.Spider):
    name = "juno"
    allowed_domains = ["http://www.juno.co.uk/"]
    start_urls = [
        "http://www.juno.co.uk/dj-equipment/"
    ]

    def parse(self, response):
        filename = response.url.split("/")[-2]
        with open(filename, 'wb') as f:
            f.write(response.body)

python

scrapy

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-10-15 03:10:55

当您使用scrapy作为项目名启动一个项目时，它会创建您打印的目录结构：

.
├── scrapy
│   ├── __init__.py
│   ├── items.py
│   ├── pipelines.py
│   ├── settings.py
│   └── spiders
│       ├── __init__.py
│       └── juno_spider.py
└── scrapy.cfg

但是使用scrapy作为项目名称有一个附带的效果。如果打开生成的scrapy.cfg，您将看到默认设置指向scrapy.settings模块。

[settings]
default = scrapy.settings

当我们对scrapy.settings文件进行cat时，我们会看到：

BOT_NAME = 'scrapy'

SPIDER_MODULES = ['scrapy.spiders']
NEWSPIDER_MODULE = 'scrapy.spiders'

好吧，这里没什么奇怪的。机器人名称，Scrapy将在其中查找爬行器的模块列表，以及使用genspider命令创建新爬行器的模块。到现在为止还好。

现在让我们检查scrapy库。它已经被正确地安装在/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy目录下的proscraper隔离的virtualenv下。记住，site-packages总是被添加到sys.path中，它包含了Python搜索模块的所有路径。所以你猜怎么着..。scrapy库还有一个settings模块/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings，它导入保存所有设置默认值的/home/tim/.virtualenvs/proscraper/lib/python2.7/site-packages/scrapy/settings/default_settings.py。请特别注意默认的SPIDER_MODULES条目：

SPIDER_MODULES = []

也许你已经开始明白发生了什么。选择scrapy作为项目名称还会生成一个与scrapy库scrapy.settings冲突的scrapy.settings模块。这里是在sys.path中插入相应路径的顺序将使Python导入其中一个路径。第一个出现的人获胜。在这种情况下，scrapy库设置获胜。因此就有了KeyError: 'Spider not found: juno'。

要解决此冲突，您可以将项目文件夹重命名为另一个名称，比如scrap

.
├── scrap
│   ├── __init__.py

修改您的scrapy.cfg以指向适当的settings模块：

[settings]
default = scrap.settings

并更新您的scrap.settings以指向适当的爬行器：

SPIDER_MODULES = ['scrap.spiders']

但正如@paultrmbrth建议的那样，我会用另一个名字重新创建这个项目。

票数 10

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/26359598

复制

相似问题

问Python Scrapy tutorial KeyError：‘找不到爬虫：
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Scrapy tutorial KeyError：‘找不到爬虫：EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Python Scrapy tutorial KeyError：‘找不到爬虫：
EN