我是编程新手,我正在尝试学习刮刮,使用刮伤教程:http://doc.scrapy.org/en/latest/intro/tutorial.html
所以我运行了"scrapy scrapy dmoz“命令,得到了以下错误:
2015-07-14 16:11:02 [scrapy] INFO: Scrapy 1.0.1 started (bot: tutorial)
2015-07-14 16:11:02 [scrapy] INFO: Optional features available: ssl, http11
2015-07-14 16:11:02 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'tu
torial.spiders', 'SPIDER_MODULES': ['tutorial.spiders'], 'BOT_NAME': 'tutorial'}
2015-07-14 16:11:05 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsol
e, LogStats, CoreStats, SpiderState
Unhandled error in Deferred:
2015-07-14 16:11:06 [twisted] CRITICAL: Unhandled error in Deferred:
2015-07-14 16:11:07 [twisted] CRITICAL:我正在使用windows 7和python 2.7。有人知道问题出在哪里吗?我怎么才能解决这个问题?
编辑:我的蜘蛛文件代码是:
# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to create and manage
# your spiders.
import scrapy
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/computers/programming/languages/python/books/",
"http://www.dmoz.org/computer/programming/languages/python/resources/"
]
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename,'wb') as f:
f.write(response.body)items.py代码:
import scrapy
class DmozItem(scrapy.Item):
title = scrapy.Field()
link = scrapy.Field()
desc = scrapy.Field()和平执行方案清单:
这不是我的母语,而是我的母语。
发布于 2015-07-19 09:50:37
我也开始学刮痕了,也遇到了同样的问题。经过一个下午的努力,我终于发现这是由于pywin32模块只下载而不安装。您可以尝试在cmd中输入下面的命令以完成pywin32模块的安装,然后再次尝试爬行:
python python27\scripts\pywin32_postinstall.py -install
我希望这会有帮助!
发布于 2015-11-12 23:04:36
简单的回答是你错过了pywin32!
其他答案基本上是正确的,但不是100%正确。pywin32不是pip安装!您必须从这里下载安装程序包:
http://sourceforge.net/projects/pywin32/files/pywin32/
确保得到正确的位: 32或64。在我的例子中,我没有意识到在我的64位计算机上安装了32位版本的Python,安装程序失败了,因为“无法在注册表中找到Python2.7安装”。我不得不安装32位版本的pywin32。一旦我这样做,抓取爬虫网站工作。
发布于 2015-07-20 04:56:26
我看不出你把物品写成文件是怎么回事。但可能是进口的。尝试这一点,如果这不工作尝试,pip安装pywin -update和pip安装Twisted -update,这应该重新安装任何损坏的文件。此外,我不知道这是否是Stack的问题,但你有一些错误的空位。从scrapy.spiders进口蜘蛛
from {Projectname}.items import {Itemclass}
import scrapy
class DmozSpider(scrapy.Spider):
name = "dmoz"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/computers/programming/languages/python/books/",
"http://www.dmoz.org/computer/programming/languages/python/resources/"]
def parse(self, response):
filename = response.url.split("/")[-2] + '.html'
with open(filename,'wb') as f:
f.write(response.body)https://stackoverflow.com/questions/31439540
复制相似问题