背景
使用Python,我通过迭代列表来爬行存储在列表中的网站列表。每个网站URL都从列表中收集,并通过一个函数进行爬行。返回该函数的响应,并将爬行的数据添加到目录中。
问题
每次从爬行函数调用返回新响应并将响应添加到字典时,目录中的所有值都会使用最新的值进行更新。我还尝试将响应添加到列表中,列表中的所有值也将使用最新的响应值进行更新。
调试尝试了
在将它们添加到字典或列表之前和之后,我在每次迭代中都会打印单独的响应,这些响应在添加到目录或列表之前和之后是相同的,并且在每次迭代中都不同。这意味着,根据预期的行为,人们的反应是截然不同的。但是整个列表会用最新的值进行更新。
码
for jobListingPage in jobListingPages:
try:
r = urllib.urlopen(jobListingPage).read()
soup = BeautifulSoup(r, "html.parser")
jobsSummaryMarkup = soup.find_all("h2", class_=["g-col10"])
i = 0
for jobSummaryMarkup in jobsSummaryMarkup:
jobDetailsURL = base_url_sof+str(jobSummaryMarkup.a["href"])
jobDetailsFindRes = find_job_details(jobDetailsURL)
if(jobDetailsFindRes[0] == 0):
#print("******crawled response before adding")
#print(jobDetailsFindRes[1])
i=i+1
all_jobs_data["job "+str(i)] = jobDetailsFindRes[1]
#print("******crawled response after adding")
#print(jobDetailsFindRes[1])
#print("******cumulative dictionary")
#print(all_jobs_data)
#print("###########################################")
return([0, all_jobs_data])
except Exception as e:
return([-1, e])上述代码的输出
取消对打印语句进行注释后的输出,下面的输出是obtained.after三次迭代,即从列表中抓取三个网站。
******crawled response before adding
{'location_name': 'Bengaluru', 'tags': ['user-interface', 'html5', 'javascript', 'angularjs', 'reactjs'], 'job_url': 'http://www.stackoverflow.com/jobs/170630/ui-front-end-developer-citrix', 'Experience level': ['Mid-Level', ' Senior', ' Lead'], 'Job type': ['Permanent'], 'Role': ['Frontend Developer'], 'company_name': 'Citrix', 'job_name': 'UI /Front-End Developer'}
******crawled response after adding
{'location_name': 'Bengaluru', 'tags': ['user-interface', 'html5', 'javascript', 'angularjs', 'reactjs'], 'job_url': 'http://www.stackoverflow.com/jobs/170630/ui-front-end-developer-citrix', 'Experience level': ['Mid-Level', ' Senior', ' Lead'], 'Job type': ['Permanent'], 'Role': ['Frontend Developer'], 'company_name': 'Citrix', 'job_name': 'UI /Front-End Developer'}
******cumulative dictionary
{'job 1': {'location_name': 'Bengaluru', 'tags': ['user-interface', 'html5', 'javascript', 'angularjs', 'reactjs'], 'job_url': 'http://www.stackoverflow.com/jobs/170630/ui-front-end-developer-citrix', 'Experience level': ['Mid-Level', ' Senior', ' Lead'], 'Job type': ['Permanent'], 'Role': ['Frontend Developer'], 'company_name': 'Citrix', 'job_name': 'UI /Front-End Developer'}}
#########################################
******crawled response before adding
{'location_name': 'Bengaluru', 'tags': ['python', 'django', 'java'], 'job_url': 'http://www.stackoverflow.com/jobs/171885/full-stack-developer-mishipay', 'Industry': ['Mobile Payments', ' POS', ' Retail'], 'Experience level': ['Mid-Level'], 'Job type': ['Permanent'], 'Role': ['Full Stack Developer'], 'company_name': 'MishiPay', 'job_name': 'Full Stack Developer'}
******crawled response after adding
{'location_name': 'Bengaluru', 'tags': ['python', 'django', 'java'], 'job_url': 'http://www.stackoverflow.com/jobs/171885/full-stack-developer-mishipay', 'Industry': ['Mobile Payments', ' POS', ' Retail'], 'Experience level': ['Mid-Level'], 'Job type': ['Permanent'], 'Role': ['Full Stack Developer'], 'company_name': 'MishiPay', 'job_name': 'Full Stack Developer'}
******cumulative dictionary
{'job 1': {'location_name': 'Bengaluru', 'tags': ['python', 'django', 'java'], 'job_url': 'http://www.stackoverflow.com/jobs/171885/full-stack-developer-mishipay', 'Industry': ['Mobile Payments', ' POS', ' Retail'], 'Experience level': ['Mid-Level'], 'Job type': ['Permanent'], 'Role': ['Full Stack Developer'], 'company_name': 'MishiPay', 'job_name': 'Full Stack Developer'}, 'job 2': {'location_name': 'Bengaluru', 'tags': ['python', 'django', 'java'], 'job_url': 'http://www.stackoverflow.com/jobs/171885/full-stack-developer-mishipay', 'Industry': ['Mobile Payments', ' POS', ' Retail'], 'Experience level': ['Mid-Level'], 'Job type': ['Permanent'], 'Role': ['Full Stack Developer'], 'company_name': 'MishiPay', 'job_name': 'Full Stack Developer'}}
#########################################
******crawled response before adding
{'location_name': 'Hyderabad', 'tags': ['architecture', 'web-services', 'togaf', 'websecurity', 'bigdata'], 'job_url': 'http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe', 'Industry': ['Financial Services', ' Financial Technology', ' Information Technology'], 'Experience level': ['Mid-Level', ' Senior'], 'Job type': ['Permanent'], 'Role': ['System Administrator'], 'company_name': 'Paysafe', 'job_name': 'Web Security Architect in Fintech & Big Data'}
******crawled response after adding
{'location_name': 'Hyderabad', 'tags': ['architecture', 'web-services', 'togaf', 'websecurity', 'bigdata'], 'job_url': 'http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe', 'Industry': ['Financial Services', ' Financial Technology', ' Information Technology'], 'Experience level': ['Mid-Level', ' Senior'], 'Job type': ['Permanent'], 'Role': ['System Administrator'], 'company_name': 'Paysafe', 'job_name': 'Web Security Architect in Fintech & Big Data'}
******cumulative dictionary
{'job 1': {'location_name': 'Hyderabad', 'tags': ['architecture', 'web-services', 'togaf', 'websecurity', 'bigdata'], 'job_url': 'http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe', 'Industry': ['Financial Services', ' Financial Technology', ' Information Technology'], 'Experience level': ['Mid-Level', ' Senior'], 'Job type': ['Permanent'], 'Role': ['System Administrator'], 'company_name': 'Paysafe', 'job_name': 'Web Security Architect in Fintech & Big Data'}, 'job 2': {'location_name': 'Hyderabad', 'tags': ['architecture', 'web-services', 'togaf', 'websecurity', 'bigdata'], 'job_url': 'http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe', 'Industry': ['Financial Services', ' Financial Technology', ' Information Technology'], 'Experience level': ['Mid-Level', ' Senior'], 'Job type': ['Permanent'], 'Role': ['System Administrator'], 'company_name': 'Paysafe', 'job_name': 'Web Security Architect in Fintech & Big Data'}, 'job 3': {'location_name': 'Hyderabad', 'tags': ['architecture', 'web-services', 'togaf', 'websecurity', 'bigdata'], 'job_url': 'http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe', 'Industry': ['Financial Services', ' Financial Technology', ' Information Technology'], 'Experience level': ['Mid-Level', ' Senior'], 'Job type': ['Permanent'], 'Role': ['System Administrator'], 'company_name': 'Paysafe', 'job_name': 'Web Security Architect in Fintech & Big Data'}}
#########################################最后一项是通过整个字典传递的,并更新所有项。如果我将最后一项追加到列表中,则整个列表将使用最后一项进行更新。
如何将不同的项添加到字典中,而不是将整个目录由最后一个项更新?
编辑:在列表中添加响应的代码版本,而不是添加到字典中。
码
for jobListingPage in jobListingPages:
try:
r = urllib.urlopen(jobListingPage).read()
soup = BeautifulSoup(r, "html.parser")
jobsSummaryMarkup = soup.find_all("h2", class_=["g-col10"])
for jobSummaryMarkup in jobsSummaryMarkup:
jobDetailsURL = base_url_sof+str(jobSummaryMarkup.a["href"])
jobDetailsFindRes = find_job_details(jobDetailsURL)
if(jobDetailsFindRes[0] == 0):
#print("******crawled response before adding")
#print(jobDetailsFindRes[1])
all_jobs_data_list.append(jobDetailsFindRes[1])
#print("******crawled response after adding")
#print(jobDetailsFindRes[1])
#print("******cumulative list")
#print(all_jobs_data_list)
#print("###########################################")
return([0, all_jobs_data])
except Exception as e:
return([-1, e])上面代码的输出是:
******crawled response before adding
{'location_name': 'Bengaluru', 'tags': ['user-interface', 'html5', 'javascript', 'angularjs', 'reactjs'], 'job_url': 'http://www.stackoverflow.com/jobs/170630/ui-front-end-developer-citrix', 'Experience level': ['Mid-Level', ' Senior', ' Lead'], 'Job type': ['Permanent'], 'Role': ['Frontend Developer'], 'company_name': 'Citrix', 'job_name': 'UI /Front-End Developer'}
******crawled response after adding
{'location_name': 'Bengaluru', 'tags': ['user-interface', 'html5', 'javascript', 'angularjs', 'reactjs'], 'job_url': 'http://www.stackoverflow.com/jobs/170630/ui-front-end-developer-citrix', 'Experience level': ['Mid-Level', ' Senior', ' Lead'], 'Job type': ['Permanent'], 'Role': ['Frontend Developer'], 'company_name': 'Citrix', 'job_name': 'UI /Front-End Developer'}
******cumulative dictionary
[{'location_name': 'Bengaluru', 'tags': ['user-interface', 'html5', 'javascript', 'angularjs', 'reactjs'], 'job_url': 'http://www.stackoverflow.com/jobs/170630/ui-front-end-developer-citrix', 'Experience level': ['Mid-Level', ' Senior', ' Lead'], 'Job type': ['Permanent'], 'Role': ['Frontend Developer'], 'company_name': 'Citrix', 'job_name': 'UI /Front-End Developer'}]
#########################################
******crawled response before adding
{'location_name': 'Bengaluru', 'tags': ['python', 'django', 'java'], 'job_url': 'http://www.stackoverflow.com/jobs/171885/full-stack-developer-mishipay', 'Industry': ['Mobile Payments', ' POS', ' Retail'], 'Experience level': ['Mid-Level'], 'Job type': ['Permanent'], 'Role': ['Full Stack Developer'], 'company_name': 'MishiPay', 'job_name': 'Full Stack Developer'}
******crawled response after adding
{'location_name': 'Bengaluru', 'tags': ['python', 'django', 'java'], 'job_url': 'http://www.stackoverflow.com/jobs/171885/full-stack-developer-mishipay', 'Industry': ['Mobile Payments', ' POS', ' Retail'], 'Experience level': ['Mid-Level'], 'Job type': ['Permanent'], 'Role': ['Full Stack Developer'], 'company_name': 'MishiPay', 'job_name': 'Full Stack Developer'}
******cumulative dictionary
[{'location_name': 'Bengaluru', 'tags': ['python', 'django', 'java'], 'job_url': 'http://www.stackoverflow.com/jobs/171885/full-stack-developer-mishipay', 'Industry': ['Mobile Payments', ' POS', ' Retail'], 'Experience level': ['Mid-Level'], 'Job type': ['Permanent'], 'Role': ['Full Stack Developer'], 'company_name': 'MishiPay', 'job_name': 'Full Stack Developer'}, {'location_name': 'Bengaluru', 'tags': ['python', 'django', 'java'], 'job_url': 'http://www.stackoverflow.com/jobs/171885/full-stack-developer-mishipay', 'Industry': ['Mobile Payments', ' POS', ' Retail'], 'Experience level': ['Mid-Level'], 'Job type': ['Permanent'], 'Role': ['Full Stack Developer'], 'company_name': 'MishiPay', 'job_name': 'Full Stack Developer'}]
#########################################
******crawled response before adding
{'location_name': 'Hyderabad', 'tags': ['architecture', 'web-services', 'togaf', 'websecurity', 'bigdata'], 'job_url': 'http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe', 'Industry': ['Financial Services', ' Financial Technology', ' Information Technology'], 'Experience level': ['Mid-Level', ' Senior'], 'Job type': ['Permanent'], 'Role': ['System Administrator'], 'company_name': 'Paysafe', 'job_name': 'Web Security Architect in Fintech & Big Data'}
******crawled response after adding
{'location_name': 'Hyderabad', 'tags': ['architecture', 'web-services', 'togaf', 'websecurity', 'bigdata'], 'job_url': 'http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe', 'Industry': ['Financial Services', ' Financial Technology', ' Information Technology'], 'Experience level': ['Mid-Level', ' Senior'], 'Job type': ['Permanent'], 'Role': ['System Administrator'], 'company_name': 'Paysafe', 'job_name': 'Web Security Architect in Fintech & Big Data'}
******cumulative dictionary
[{'location_name': 'Hyderabad', 'tags': ['architecture', 'web-services', 'togaf', 'websecurity', 'bigdata'], 'job_url': 'http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe', 'Industry': ['Financial Services', ' Financial Technology', ' Information Technology'], 'Experience level': ['Mid-Level', ' Senior'], 'Job type': ['Permanent'], 'Role': ['System Administrator'], 'company_name': 'Paysafe', 'job_name': 'Web Security Architect in Fintech & Big Data'}, {'location_name': 'Hyderabad', 'tags': ['architecture', 'web-services', 'togaf', 'websecurity', 'bigdata'], 'job_url': 'http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe', 'Industry': ['Financial Services', ' Financial Technology', ' Information Technology'], 'Experience level': ['Mid-Level', ' Senior'], 'Job type': ['Permanent'], 'Role': ['System Administrator'], 'company_name': 'Paysafe', 'job_name': 'Web Security Architect in Fintech & Big Data'}, {'location_name': 'Hyderabad', 'tags': ['architecture', 'web-services', 'togaf', 'websecurity', 'bigdata'], 'job_url': 'http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe', 'Industry': ['Financial Services', ' Financial Technology', ' Information Technology'], 'Experience level': ['Mid-Level', ' Senior'], 'Job type': ['Permanent'], 'Role': ['System Administrator'], 'company_name': 'Paysafe', 'job_name': 'Web Security Architect in Fintech & Big Data'}]
#########################################jobListingPages的示例数据:
['https://stackoverflow.com/jobs?sort=p&l=India&d=100&u=Km', 'https://stackoverflow.com/jobs?l=India&d=100&u=Km&sort=i&pg=2']jobListingPages的示例数据:
http://www.stackoverflow.com/jobs/170630/ui-front-end-developer-citrix
http://www.stackoverflow.com/jobs/171885/full-stack-developer-mishipay
http://www.stackoverflow.com/jobs/168402/web-security-architect-in-fintech-big-data-paysafe发布于 2018-03-28 06:32:26
我相信i = 0是罪魁祸首。请将它移出外部循环,然后再试一次。作业计数器在列表的每个URL元素处被重置,它更新相同键的现有值(例如:作业1)
发布于 2018-03-28 07:25:38
解决了。
我不知道它是如何工作的,但是all_jobs_data_list.append(str(jobDetailsFindRes[1]))给列表而不是all_jobs_data_list.append(jobDetailsFindRes[1])做了我的工作。
类似地,all_jobs_data_list["job "+str(i)] = str(jobDetailsFindRes[1])代替了all_jobs_data_list["job "+str(i)] = jobDetailsFindRes[1],得到了不同的条目。
如果有人能解释这一点,我会很感激的:)
https://stackoverflow.com/questions/49527445
复制相似问题