我试图用Python2.7和BeautifulSoup抓取网页,但我无法通过一个协议错误,这对我来说没有多大意义。这只发生在我需要执行此操作的特定网站上:https://edd.telstra.com/telstra
我使用的代码只是用于基本测试:
#! /usr/bin/python
from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re
# Copy all of the content from the provided web page
webpage = urlopen("https://edd.telstra.com/telstra/").read()我得到以下错误(在Ubuntu 12.10上运行):
Traceback (most recent call last):
File "e.py", line 8, in <module>
webpage = urlopen("https://edd.telstra.com/telstra/").read()
File "/usr/lib/python2.7/urllib.py", line 86, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 207, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 436, in open_https
h.endheaders(data)
File "/usr/lib/python2.7/httplib.py", line 958, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 818, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 780, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 1165, in connect
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
File "/usr/lib/python2.7/ssl.py", line 381, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 143, in __init__
self.do_handshake()
File "/usr/lib/python2.7/ssl.py", line 305, in do_handshake
self._sslobj.do_handshake()
IOError: [Errno socket error] [Errno 1] _ssl.c:504: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac谁能告诉我是否需要指定一些参数才能用Python下载这个页面?这似乎就是这个网页的问题所在,因为上面的代码(加上我尝试过的许多其他代码)在我尝试过的其他HTTPS/SSL页面上工作得很好。
谢谢你的帮助!
发布于 2015-01-20 19:47:58
我推荐使用requests lib:
def get_page(login, password):
'''Docstring
'''
url = 'https://qwe.qwe'
payload = {
'user': login,
'pass': password
}
with requests.Session() as my_session:
my_session.post(url, data=payload)
data = my_session.get(url)
return data.text更多信息:http://docs.python-requests.org/en/latest/user/advanced/#session-objects
https://stackoverflow.com/questions/15915656
复制相似问题