我正在尝试使用Python的requests库自动从大学网站获取我的成绩。URL是https://acorn.utoronto.ca/sws/transcript/academic/main.do?main.dispatch,但有几个重定向。我有以下简单的代码,但它似乎不是我想要的。
import requests
payload = {"user" : "username", "pass" : "password"}
r = requests.post("https://acorn.utoronto.ca/sws/transcript/academic/main.do?main.dispatch", data= payload)
print(r.text)输出如下:
C:\Users\johnp\AppData\Local\Programs\Python\Python35-32\python.exe
C:/Users/johnp/Desktop/git_stuff/16AugRequests/acorn_requests.py <html> <head> </head> <body onLoad="document.relay.submit()"> <form method=post action="https://weblogin.utoronto.ca/" name=relay> <input type=hidden name=pubcookie_g_req value="b25lPWlkcC51dG9yYXV0aC51dG9yb250by5jYSZ0d289Q0lNRl9TaGliYm9sZXRoX1BpbG90JnRocmVlPTEmZm91cj1hNWEmZml2ZT1HRVQmc2l4PWlkcC51dG9yYXV0aC51dG9yb250by5jYSZzZXZlbj1MMmxrY0M5QmRYUm9iaTlTWlcxdmRHVlZjMlZ5Um05eVkyVkJkWFJvJmVpZ2h0PSZob3N0bmFtZT1pZHAudXRvcmF1dGgudXRvcm9udG8uY2EmbmluZT0xJmZpbGU9JnJlZmVyZXI9KG51bGwpJnNlc3NfcmU9NSZwcmVfc2Vzc190b2s9LTczODQ3MDk2OCZmbGFnPTA=">您没有打开Javascript,请单击按钮继续。
我这样做对吗?我觉得我应该试着传递一个cookie,但是我怎么才能得到cookie呢?
提前谢谢。
编辑:这是我从Firefox上得到的东西:Network tab
这是否意味着我需要在请求中填写整个表单作为参数?
发布于 2016-08-16 18:00:19
你可以尝试登录,然后获得你想要的任何页面,有更多的数据要张贴,你可以使用bs4
import requests
from bs4 import BeautifulSoup
url = "https://weblogin.utoronto.ca/"
with requests.Session() as s:
s.headers.update({"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"})
soup = BeautifulSoup(s.get(url).content)
data = {inp["name"]: inp["value"] for inp in soup.select("#query input[value]")}
data["user"] = "username"
data["pass"] = "password"
post = s.post(url, data=data)
print post
print(post.content)
protect = s.get("protected_page")如果我们运行代码并打印数据字典,您可以看到bs4填充了必填字段:
In [14]: with requests.Session() as s:
....: s.headers.update({"User-Agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"})
....: soup = BeautifulSoup(s.get(url).content,"html.parser")
....: data = {inp["name"]:inp["value"] for inp in soup.select("#query input[value]")}
....: data["user"] = "username"
....: data["pass"] = "password"
....: print(data)
....:
{'seven': '/index.cgi', 'sess_re': '0', 'pre_sess_tok': '0', 'pass': 'password', 'four': 'a5', 'user': 'username', 'reply': '1', 'two': 'pinit', 'hostname': '', 'three': '1', 'pinit': '1', 'relay_url': '', 'nine': 'PInit', 'create_ts': '1471341718', 'referer': '', 'six': 'weblogin.utoronto.ca', 'first_kiss': '1471341718-777129', 'flag': '', 'five': '', 'post_stuff': '', 'creds_from_greq': '1', 'fr': '', 'eight': '', 'one': 'weblogin.utoronto.ca', 'file': ''}https://stackoverflow.com/questions/38967833
复制相似问题