我试图从“职务”页面中获得职务描述和公司名称,但我没有收到,也找不到问题。因为似乎一切对我来说都没问题。所以请帮帮我,为什么我不能从求职页面上得到数据?
from scrapy import Spider
from scrapy.http import Request
class IndeedSpider(Spider):
name = 'indeed'
allowed_domains = ['indeed.com']
start_urls = ['https://www.indeed.co.uk/jobs?q=Russian&fromage=1']
def parse(self, response):
jobs = response.xpath('//*[contains(@class, "row result")]')
for job in jobs:
job_title = job.xpath('.//*[@class="title"]//a/@title').extract_first()
job_location = job.xpath('.//*[contains(@class, "location")]/text()').extract_first()
job_link = job.xpath('.//*[@class="title"]//a/@href').extract_first()
absulate_job_link = response.urljoin(job_link)
print(absulate_job_link)
yield Request(url=absulate_job_link,
callback=self.parse_jobpage,
meta={
"Job Title": job_title,
"Location": job_location,
"Job Link": absulate_job_link
})
def parse_jobpage(self, response):
job_title = response.meta.get('Job Title')
job_location = response.meta.get('Location')
absulate_job_link = response.meta.get('Job Link')
job_description = "".join(line for line in response.xpath('//*[@id="jobDescriptionText"]//text()').extract())
company = response.xpath('//*[contains(@class, "icl-u-lg-mr--sm")]//text()').extract_first()
yield {
"Job Title": job_title,
"Location": job_location,
"Job Link": absulate_job_link,
"Job Description": job_description,
"Company": company
}发布于 2020-09-29 10:24:47
您的allowed_domains与您要解析的urls不匹配。
这将导致您的第一个请求被过滤为非现场请求,并且不会提出进一步的请求。
将您的allowed_domains更改为['indeed.co.uk']应该可以解决这个问题。
https://stackoverflow.com/questions/64117094
复制相似问题