大家好~我对刮刮很陌生,遇到了一个很奇怪的问题。简单地说,我发现scrapy.Request()阻止我进入我的函数。
这是我的密码:
# -*- coding: utf-8 -*-
import scrapy
from tutor_job_spy.items import TutorJobSpyItem
class Spyspider(scrapy.Spider):
name = 'spy'
#for privacy reasons I delete the url information :)
allowed_domains = ['']
url_0 = ''
start_urls = [url_0, ]
base_url = ''
list_previous = []
list_present = []
def parse(self, response):
numbers = response.xpath( '//tr[@bgcolor="#d7ecff" or @bgcolor="#eef7ff"]/td[@width="8%" and @height="40"]/span/text()').extract()
self.list_previous = numbers
self.list_present = numbers
yield scrapy.Request(self.url_0, self.keep_spying)
def keep_spying(self, response):
numbers = response.xpath('//tr[@bgcolor="#d7ecff" or @bgcolor="#eef7ff"]/td[@width="8%" and @height="40"]/span/text()').extract()
self.list_previous = self.list_present
self.list_present = numbers
# judge if anything new
if (self.list_present != self.list_previous):
self.goto_new_demand(response)
#time.sleep(60) #from cache
yield scrapy.Request(self.url_0, self.keep_spying, dont_filter=True)
def goto_new_demand(self, response):
new_demand_links = []
detail_links = response.xpath('//div[@class="ShowDetail"]/a/@href').extract()
for i in range(len(self.list_present)):
if (self.list_present[ i] not in self.list_previous):
new_demand_links.append(self.base_url + detail_links[i])
if (new_demand_links != []):
for new_demand_link in new_demand_links:
yield scrapy.Request(new_demand_link, self.get_new_demand)
def get_new_demand(self, response):
new_demand = TutorJobSpyItem()
new_demand['url'] = response.url
requirments = response.xpath('//tr[@#bgcolor="#eef7ff"]/td[@colspan="2"]/div/text()').extract()[0]
new_demand['gender'] = self.get_gender(requirments)
new_demand['region'] = response.xpath('//tr[@bgcolor="#d7ecff"]/td[@align="left"]/text()').extract()[5]
new_demand['grade'] = response.xpath('//tr[@bgcolor="#d7ecff"]/td[@align="left"]/text()').extract()[7]
new_demand['subject'] = response.xpath('//tr[@bgcolor="#eef7ff"]/td[@align="left"]/text()').extract()[2]
return new_demand
def get_gender(self, requirments):
if ('女老师' in requirments):
return 'F'
elif ('男老师' in requirments):
return 'M'
else:
return 'Both okay'问题是,当我调试时,我发现我无法进入goto_new_demand。
if (self.list_present != self.list_previous):
self.goto_new_demand(response)每次我运行或调试这个脚本时,它都会跳过goto_new_demand,但是在我在goto_new_demand中注释yield scrapy.Request(new_demand_link, self.get_new_demand)之后,我就可以进入它了。我尝试了很多次,发现只有当goto_new_demand中没有yyield scrapy.Request(new_demand_link, self.get_new_demand)时,我才能进入它。为什么会这样?
预先感谢任何能给出建议的人:)
PS:
刮伤: 1.5.0
lxml : 4.1.1.0
libxml2 : 2.9.5
cssselect : 1.0.3
parsel : 1.3.1
w3lib : 1.18.0
扭曲: 17.9.0
Python : 3.6.3 (v3.6.3:2c5fed8,2017年10月3日,18:11:49) MSC v.1900 64位(AMD64)
pyOpenSSL : 17.5.0 (OpenSSL 1.1.0g,2017年11月2日)
密码学: 2.1.4
平台:Windows7-6.1.7601-SP1
问题解决了!
我将生成器 goto_new_demand修改为函数 goto_new_demand。因此,这个问题完全是由于我对 problem (一种生成器)的一点理解所造成的。
下面是修改的代码:
if (self.list_present != self.list_previous):
# yield self.goto_new_demand(response)
new_demand_links = self.goto_new_demand(response)
if (new_demand_links != []):
for new_demand_link in new_demand_links:
yield scrapy.Request(new_demand_link, self.get_new_demand)
def goto_new_demand(self, response):
new_demand_links = []
detail_links = response.xpath('//div[@class="ShowDetail"]/a/@href').extract()
for i in range(len(self.list_present)):
if (self.list_present[ i] not in self.list_previous):
new_demand_links.append(self.base_url + detail_links[i])
return new_demand_links原因在于巴拉克的回答。
发布于 2018-01-22 06:35:59
我想你可能需要改变这份声明
if (self.list_present != self.list_previous):
self.goto_new_demand(response)至:
if (self.list_present != self.list_previous):
yield self.goto_new_demand(response)因为self.goto_new_demand()只是一个生成器(它在函数中有just语句),所以简单地使用self.goto_new_demand(response)不会运行任何东西。
生成器的一个简单示例可能会让您更清楚地了解这一点:
def a():
print("hello")
# invoke a will print out hello
a()但是对于一个生成器,只需调用它就会返回一个生成器:
def a():
yield
print("hello")
# invoke a will not print out hello, instead it will return a generator object
a()因此,在scrapy中,您应该使用yield self.goto_new_demand(response)使goto_new_demand(response)实际运行。
https://stackoverflow.com/questions/48374455
复制相似问题