在本地工作时,我知道如何从外部源加载数据到Scrapy蜘蛛中。但是,我很难找到任何关于如何将这个文件部署到scrapinghub以及在那里使用什么路径的信息。现在,我使用SH文档中的这种方法-- 在这里输入链接描述,但没有收到任何对象。
import pkgutil
class CodeSpider(scrapy.Spider):
name = "code"
allowed_domains = ["google.com.au"]
def start_requests(self, ):
f = pkgutil.get_data("project", "res/final.json")
a = json.loads(f.read())谢谢。我的安装文件
from setuptools import setup, find_packages
setup(
name = 'project',
version = '1.0',
packages = find_packages(),
package_data = {'project': ['res/*.json']
},
entry_points = {'scrapy': ['settings = au_go.settings']},
zip_safe=False,
)我所犯的错误。
Traceback (most recent call last):
File "/usr/local/lib/python2.7/site-packages/scrapy/core/engine.py", line 127, in _next_request
request = next(slot.start_requests)
File "/tmp/unpacked-eggs/__main__.egg/au_go/spiders/code.py", line 16, in start_requests
a = json.loads(f.read())
AttributeError: 'NoneType' object has no attribute 'read'发布于 2017-08-10 14:17:44
根据您提供的跟踪,我假设您的项目文件如下所示:
au_go/
__init__.py
settings.py
res/
final.json
spiders/
__init__.py
code.py
scrapy.cfg
setup.py在这种假设下,setup.py的package_data需要引用名为au_go的包。
from setuptools import setup, find_packages
setup(
name = 'au_go',
version = '1.0',
packages = find_packages(),
package_data = {
'au_go': ['res/*.json']
},
entry_points = {'scrapy': ['settings = au_go.settings']},
zip_safe=False,
)然后你可以使用pkgutil.get_data("au_go", "res/final.json")。
https://stackoverflow.com/questions/45586091
复制相似问题