我正在尝试使用BeautifulSoup刮一个网站。该网站需要登录。
https://www.bahn.de/p/view/meinebahn/login.shtml
通过对web的研究,我了解到获得授权的一个正确方法是使用requests。
我的代码如下:
url = 'https://www.bahn.de/p/view/meinebahn/login.shtml'
header = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5)AppleWebKit 537.36 (KHTML, like Gecko) Chrome","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp ,*/*;q=0.8"}
user = "username"
pwrd = "password"
response = requests.post(url,headers = header, auth=(user, pwrd))
page = requests.get('https://fahrkarten.bahn.de/privatkunde/meinebahn/meine_bahn_portal.go?lang=de&country=DEU#stay')
soup = BeautifulSoup(page.text, 'html.parser')不幸的是,这不起作用,因为soup是一个html文本,声明“您退出了我们的系统”。虽然response的结果是<Response [200]>
我与auth有一点矛盾,原因有二:
任何帮助都会很感激,因为我真的想了解它,我显然是“新手”从手册中得到正确的结论(例如http://docs.python-requests.org/en/master/user/authentication/)。
发布于 2017-01-31 12:40:48
了解网站身份验证工作方式的最简单方法是,在登录时捕获流量,查找场景背后发生的事情:使用哪个URL、提交了哪些数据等。
您可以使用fiddler或charles,或者最方便的Chrome工具(F12启动),如下所示:

就你的情况而言,整个请求是:
POST /privatkunde/start/start.post HTTP/1.1
Host: fahrkarten.bahn.de
Connection: keep-alive
Content-Length: 74
Cache-Control: max-age=0
Origin: https://www.bahn.de
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Referer: https://www.bahn.de/p/view/meinebahn/login.shtml
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.8
scope=bahnde&lang=de&country=DEU&username=demo&password=demo&login-submit=最重要的是,由于cookie用于身份验证/验证,所以整个流程需要一个会话,然后用于访问仅供登录用户访问的其他网页。
import requests
session = requests.Session() # create a session that handles cookies by default
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5)AppleWebKit 537.36 (KHTML, like Gecko) Chrome"
... # simulate headers that is used in the actual POST request
}
data = {'scope': 'bahnde', 'lang': 'de', 'country': 'DEU',
'username': 'xxxx', 'password': 'xxxx', 'login-submit': ''
}
# now login
response = session.post(url='https://fahrkarten.bahn.de/privatkunde/start/start.post', data=data, headers=headers)
# once logged in, session can be used to access other web pages
# sometimes you also want to make sure it actually logged in by checking content from response.text
content = response.text
# try to look for your username or other flags with content.find etc.
r2 = session.get(url='xxx') # access other pages发布于 2017-01-31 10:52:50
这很可能是因为您请求错误的页面,请查看登录页面中的表单:
<form method="post" name="staticLogin" id="kv-static-logi" action="https://fahrkarten.bahn.de/privatkunde/start/start.post">
<input name="scope" value="bahnde" type="hidden">
<input name="lang" value="de" type="hidden">
<input name="country" value="DEU" type="hidden">
<p>
<input id="kv-static-login-username_ab" name="username" class="from" maxlength="60" autocomplete="off" placeholder="Benutzername" type="text">
</p>
<p>
<input id="kv-static-login-password_ab" name="password" class="from" maxlength="60" placeholder="Passwort" type="password">
</p>
<p><button type="submit" name="login-submit" class="btn slim no-margin" style="float: left">Login</button>
<a id="vergessen" href="https://fahrkarten.bahn.de/privatkunde/start/start.post?scope=pwvergessen&lang=de">Login vergessen?</a>
</p></form>您应该使用https://fahrkarten.bahn.de/privatkunde/start/start.post和username字段请求页面password。把你的要求也保留下来!(象征性等)
再见!
https://stackoverflow.com/questions/41955547
复制相似问题