开发者环境
Windows 11PyCharm Community Edition 2021.3.1Python 3.10我遵循本教程Python和Scrapy下载图片,我无法使我的脚本工作。
spider.py
import scrapy
class WikiSpider(scrapy.Spider):
name = 'wiki'
start_urls = ['https://en.wikipedia.org/wiki/Real_Madrid_CF']
def parse(self, response):
urls = response.css('.image img ::attr(src)').getall()
clean_urls = []
for url in urls:
clean_urls.append(response.urljoin(url))
yield {
'image_urls':clean_url
}settings.py
BOT_NAME = 'imagedownload'
SPIDER_MODULES = ['imagedownload.spiders']
NEWSPIDER_MODULE = 'imagedownload.spiders'
ITEM_PIPELINES = {'scrapy.pipelines.images.ImagesPipeline': 1}
IMAGES_STORE = 'images_folder'
# Obey robots.txt rules
ROBOTSTXT_OBEY = True在本教程中,items.py和pipelines.py没有被修改。当我运行我的蜘蛛时,它没有错误地运行,我可以看到解析的图像urls,但是,我没有下载图像:

我已经采取步骤来解决这个问题,
ROBOTSTXT_OBEY = Falsespider.py文件中 save_location = os.getcwd()
custom_settings = {
"ITEM_PIPELINES": {'scrapy.pipelines.images.ImagesPipeline': 1},
"IMAGES_STORE": save_location
}settings.py中IMAGES_STORE = os.getcwd()任何帮助都将不胜感激!
What I expect is for the script to download images发布于 2022-10-30 20:53:21
你们关系很好。我认为造成这种情况的部分原因是,您还没有为生成的字典中的图像结果创建适当的Field。
我建议使用带有字段预置的自定义scrapy项,您可以在与爬行器相同的文件中这样做,以使其更容易,然后只需将所有ImagesPipeline设置添加到Spider类的custom_settings字典中即可。
例如:
import scrapy
class Item(scrapy.Item):
images_urls = scrapy.Field()
images = scrapy.Field()
class WikiSpider(scrapy.Spider):
custom_settings = {
"IMAGES_STORE" : "images", # <- make sure whatever you put here is an existing empty folder at the top level of your project.
"ITEM_PIPELINES" : {"scrapy.pipelines.images.ImagesPipeline": 1},
"IMAGES_URLS_FIELD": "images_urls",
"IMAGES_RESULT_FIELD": "images",
}
name = 'wiki'
start_urls = ['https://en.wikipedia.org/wiki/Real_Madrid_CF']
def parse(self, response):
for url in response.css('.image img ::attr(src)').getall():
item = Item()
item['images_urls'] = [response.urljoin(url)]
yield itemhttps://stackoverflow.com/questions/74250020
复制相似问题