首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python SSL网页抓取

Python SSL网页抓取
EN

Stack Overflow用户
提问于 2013-04-10 09:59:38
回答 1查看 2.1K关注 0票数 1

我试图用Python2.7和BeautifulSoup抓取网页,但我无法通过一个协议错误,这对我来说没有多大意义。这只发生在我需要执行此操作的特定网站上:https://edd.telstra.com/telstra

我使用的代码只是用于基本测试:

代码语言:javascript
复制
#! /usr/bin/python

from urllib import urlopen
from BeautifulSoup import BeautifulSoup
import re

# Copy all of the content from the provided web page
webpage = urlopen("https://edd.telstra.com/telstra/").read()

我得到以下错误(在Ubuntu 12.10上运行):

代码语言:javascript
复制
Traceback (most recent call last):
File "e.py", line 8, in <module>
webpage = urlopen("https://edd.telstra.com/telstra/").read()
File "/usr/lib/python2.7/urllib.py", line 86, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 207, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 436, in open_https
h.endheaders(data)
File "/usr/lib/python2.7/httplib.py", line 958, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 818, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 780, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 1165, in connect
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
File "/usr/lib/python2.7/ssl.py", line 381, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 143, in __init__
self.do_handshake()
File "/usr/lib/python2.7/ssl.py", line 305, in do_handshake
self._sslobj.do_handshake()
IOError: [Errno socket error] [Errno 1] _ssl.c:504: error:1408F119:SSL routines:SSL3_GET_RECORD:decryption failed or bad record mac

谁能告诉我是否需要指定一些参数才能用Python下载这个页面?这似乎就是这个网页的问题所在,因为上面的代码(加上我尝试过的许多其他代码)在我尝试过的其他HTTPS/SSL页面上工作得很好。

谢谢你的帮助!

EN

回答 1

Stack Overflow用户

发布于 2015-01-20 19:47:58

我推荐使用requests lib:

代码语言:javascript
复制
def get_page(login, password):
    '''Docstring 
    '''
    url = 'https://qwe.qwe'

    payload = {
        'user': login,
        'pass': password
    }

    with requests.Session() as my_session:
        my_session.post(url, data=payload)
        data = my_session.get(url)
    return data.text

更多信息:http://docs.python-requests.org/en/latest/user/advanced/#session-objects

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/15915656

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档