我正在尝试获取一个托管在tor网络中的网页。我使用以下代码:
import requests
def get_tor_session():
session = requests.session()
session.proxies = {'http': 'socks5://127.0.0.1:9150',
'https': 'socks5://127.0.0.1:9150'}
return session
session = get_tor_session()当我试图获得一个正常的网站时,它可以正常工作,例如:print(session.get("http://httpbin.org/ip").text)打印{"origin": "80.67.172.162"}
但是,当我在.onion站点上尝试它时,它失败了,出现了以下错误:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/socks.py", line 813, in connect
negotiate(self, dest_addr, dest_port)
File "/usr/local/lib/python3.6/site-packages/socks.py", line 477, in _negotiate_SOCKS5
CONNECT, dest_addr)
File "/usr/local/lib/python3.6/site-packages/socks.py", line 540, in _SOCKS5_request
resolved = self._write_SOCKS5_address(dst, writer)
File "/usr/local/lib/python3.6/site-packages/socks.py", line 592, in _write_SOCKS5_address
addresses = socket.getaddrinfo(host, port, socket.AF_UNSPEC, socket.SOCK_STREAM, socket.IPPROTO_TCP, socket.AI_ADDRCONFIG)
File "/usr/local/Cellar/python3/3.6.3/Frameworks/Python.framework/Versions/3.6/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 8] nodename nor servname provided, or not known
During handling of the above exception, another exception occurred:..。
Traceback (most recent call last):
File "spider.py", line 13, in <module>
print(session.get("http://zqktlwi4fecvo6ri.onion/").text)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 521, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: SOCKSHTTPConnectionPool(host='zqktlwi4fecvo6ri.onion', port=80): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.contri
b.socks.SOCKSConnection object at 0x106fd62e8>: Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known',))发布于 2017-11-16 21:00:39
使用socks5方案时,由客户端的DNS服务器在本地解析域。但是“正常”DNS服务器无法解析.onion域,因此您的请求失败。
来自docs.python-requests.org:
使用该方案
socks5将导致DNS解析发生在客户端,而不是代理服务器上。这与curl是一致的,curl使用这个方案来决定是在客户端还是代理上执行DNS解析。如果要解析代理服务器上的域,请使用socks5h作为方案。
因此,为了连接到.onion站点,您应该让TOR解析域。如果您在代理字典中使用socks5h表示,这是可能的。
import requests
session = requests.session()
session.proxies = {'http': 'socks5h://127.0.0.1:9150', 'https': 'socks5h://127.0.0.1:9150'}
response = session.get("https://3g2upl4pq6kufc4m.onion/")
print(response)
#<Response [200]>请注意,您可能需要安装额外的依赖项。
pip install requests[socks]https://stackoverflow.com/questions/47338274
复制相似问题