首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用cookielib模拟浏览器在python中获取url

用cookielib模拟浏览器在python中获取url
EN

Stack Overflow用户
提问于 2012-11-23 03:18:39
回答 1查看 1.5K关注 0票数 2

我正在使用cookielib,有时在浏览器中打开一个url会通过浏览器进行许多其他请求来下载许多其他文件。我可以使用cookie lib或任何其他python库来复制相同的行为吗?

例如:从页面https://applicant.keybank.com/psp/hrsappl/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_HM_PRE&Action=A&SiteId=1获取所有必需的信息

我必须从我的python脚本发出超过1个GET请求。当我打开页面时,通过分析网络请求,我得到了浏览器发出的所有请求的请求urls。

我正在看是否有任何方法,我可以只做一个请求,它自己获取所有相关的请求,就像浏览器。

我不是很感兴趣的js或css,但主要的html。

我尝试使用以下代码,但它无法下载整个页面

代码语言:javascript
复制
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

response = opener.open('https://applicant.keybank.com/psp/hrsappl/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_HM_PRE&Action=A&SiteId=1')
html = response.read()

但是,当我按顺序获取其他3个GET urls时,它能够在第三个GET响应中给出所需的html。我通过检查浏览器的网络标签获得了这些urls

代码语言:javascript
复制
'https://applicant.keybank.com/psc/hrsappl/EMPLOYEE/EMPL/s/WEBLIB_PT_NAV.ISCRIPT1.FieldFormula.IScript_UniHeader_Frame?c=NNTCgkqGs001AcPaisqGbYpTu%2fbGx4jx&Page=HRS_CE_HM_PRE&Action=A&SiteId=1&PortalActualURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentProvider=HRMS&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%2fapplicant.keybank.com%2fpsp%2fhrsappl%2f&PortalURI=https%3a%2f%2fapplicant.keybank.com%2fpsc%2fhrsappl%2f&PortalHostNode=EMPL&PortalIsPagelet=true&NoCrumbs=yes')


'https://applicant.keybank.com/psc/hrsappl/EMPLOYEE/EMPL/s/WEBLIB_PTPPB.ISCRIPT2.FieldFormula.IScript_TemplatePageletBuilder?PTPPB_PAGELET_ID=KC_LNAV_APPLICANT&target=KCNV_KC_LNAV_APPLICANT_TMPL&Page=HRS_CE_HM_PRE&Action=A&SiteId=1&PortalActualURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentProvider=HRMS&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%2fapplicant.keybank.com%2fpsp%2fhrsappl%2f&PortalURI=https%3a%2f%2fapplicant.keybank.com%2fpsc%2fhrsappl%2f&PortalHostNode=EMPL&PortalIsPagelet=true&NoCrumbs=yes&PortalTargetFrame=TargetContent'


'https://hronline.keybank.com/psc/hrshrm/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_HM_PRE&Action=A&SiteId=1&PortalActualURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentProvider=HRMS&PortalCRefLabel=Careers&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%2fapplicant.keybank.com%2fpsp%2fhrsappl%2f&PortalURI=https%3a%2f%2fapplicant.keybank.com%2fpsc%2fhrsappl%2f&PortalHostNode=EMPL&NoCrumbs=yes&PortalKeyStruct=yes'

下面是我正在进行的其他抓取操作的完整代码

代码语言:javascript
复制
response = opener.open('https://applicant.keybank.com/psc/hrsappl/EMPLOYEE/EMPL/s/WEBLIB_PT_NAV.ISCRIPT1.FieldFormula.IScript_UniHeader_Frame?c=NNTCgkqGs001AcPaisqGbYpTu%2fbGx4jx&Page=HRS_CE_HM_PRE&Action=A&SiteId=1&PortalActualURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentProvider=HRMS&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%2fapplicant.keybank.com%2fpsp%2fhrsappl%2f&PortalURI=https%3a%2f%2fapplicant.keybank.com%2fpsc%2fhrsappl%2f&PortalHostNode=EMPL&PortalIsPagelet=true&NoCrumbs=yes')
response.read()

response = opener.open('https://applicant.keybank.com/psc/hrsappl/EMPLOYEE/EMPL/s/WEBLIB_PTPPB.ISCRIPT2.FieldFormula.IScript_TemplatePageletBuilder?PTPPB_PAGELET_ID=KC_LNAV_APPLICANT&target=KCNV_KC_LNAV_APPLICANT_TMPL&Page=HRS_CE_HM_PRE&Action=A&SiteId=1&PortalActualURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentProvider=HRMS&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%2fapplicant.keybank.com%2fpsp%2fhrsappl%2f&PortalURI=https%3a%2f%2fapplicant.keybank.com%2fpsc%2fhrsappl%2f&PortalHostNode=EMPL&PortalIsPagelet=true&NoCrumbs=yes&PortalTargetFrame=TargetContent')
response.read()

response = opener.open('https://hronline.keybank.com/psc/hrshrm/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_HM_PRE&Action=A&SiteId=1&PortalActualURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentURL=https%3a%2f%2fhronline.keybank.com%2fpsc%2fhrshrm%2fEMPLOYEE%2fHRMS%2fc%2fHRS_HRAM.HRS_CE.GBL%3fPage%3dHRS_CE_HM_PRE%26Action%3dA%26SiteId%3d1&PortalContentProvider=HRMS&PortalCRefLabel=Careers&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%2fapplicant.keybank.com%2fpsp%2fhrsappl%2f&PortalURI=https%3a%2f%2fapplicant.keybank.com%2fpsc%2fhrsappl%2f&PortalHostNode=EMPL&NoCrumbs=yes&PortalKeyStruct=yes')
required_html = response.read()
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-11-23 03:57:40

正如您所看到的hererequests可以处理cookies。它是一个很棒的库,比urllib2强大得多,而且看起来也更简单。

代码语言:javascript
复制
>>> import requests
>>> r = requests.get('https://applicant.keybank.com/psp/hrsappl/EMPLOYEE/HRMS/c/HRS_HRAM.HRS_CE.GBL?Page=HRS_CE_HM_PRE&Action=A&SiteId=1')
>>> r.cookies

编辑:这个答案并没有真正解决这个问题,我读得太快了。真对不起。按照@J.F.Sebastian的建议,我将添加一个指向python webkit客户端Ghost.py的链接,该客户端可以根据您的请求模拟浏览器。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/13518883

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档