我使用url lib,urllib2,cookie lib来抓取一个web:get登录页面并发布数据。
def getpage():
codeurl=r"http://www.xxx/sign_in"
request=urllib2.Request(codeurl)
response=urllib2.urlopen(request)
return response
def parsecode(response):
"""
parse the login page to get the changed code
"""
pattern=re.compile(r"""<meta.*?csrf-token.*?content=(.*?)\s/>""")
code=re.findall(pattern,response.read())[0]
return code
def Hand():
"""
deal with cookie and header
"""
headers={
"Referer":"xxx",
"User-Agent":"xxx"
}
ck=cookielib.MozillaCookieJar()
handle=urllib2.HTTPCookieProcessor(ck)
openner=urllib2.build_opener(handle)
head=[]
for key,value in headers.items():
tup=(key,value)
head.append(tup)
openner.addheaders = head
return openner
def postdata(code,openner):
"""
post the data xxx.com needed
"""
logurl=r"http://www.jianshu.com/sessions"
sign_in={"name":"xxx","password":"xxx","authenticity_token":code}
data=urllib.urlencode(sign_in).encode("utf-8")
x=openner.open(logurl,data)
for item in ck:
print item然而,遇到了一个bug:
回溯(最近一次调用):
文件"jianshu.py",第80行,在postdata中(代码,op) 文件"jianshu.py",第43行,在postdata x=openner.open(logurl,data)中 文件“/usr/lib64 64/python2.7/urllib2.py”,第437行,开放响应= meth(req,response) File“/usr/lib64 64/python2.7/urllib2.py”,第550行,http_response 'http‘、请求、响应、代码、msg、hdr) 文件“/usr/lib64 64/python2.7/urllib2.py”,第475行,错误地返回self._call_chain(*args) 文件“/usr/lib64 64/python2.7/urllib2.py”,第409行,在_call_chain result = func(*args)中 文件“/usr/lib64 64/python2.7/urllib2.py”,第558行,在http_error_default raise (req.get_full_url(),code,msg,hdr,fp) urllib2.HTTPError: HTTP错误500:内部服务器错误
发布于 2015-09-08 10:21:31
您是否可能在“r”和“http://...”之间遗漏了一行:
codeurl=r"in“
https://stackoverflow.com/questions/32451506
复制相似问题