这个简单的Python3脚本:
import urllib.request
host = "scholar.google.com"
link = "/scholar.bib?q=info:K7uZdMSvdQ0J:scholar.google.com/&output=citation&hl=en&as_sdt=1,14&ct=citation&cd=0"
url = "http://" + host + link
filename = "cite0.bib"
print(url)
urllib.request.urlretrieve("http://scholar.google.com" + url, filename)引发此例外情况:
Traceback (most recent call last):
File "C:/Users/ricardo/Desktop/Google-Scholar/BibTex/test2.py", line 8, in <module>
urllib.request.urlretrieve("http://scholar.google.com" + url, filename)
File "C:\Python32\lib\urllib\request.py", line 150, in urlretrieve
return _urlopener.retrieve(url, filename, reporthook, data)
File "C:\Python32\lib\urllib\request.py", line 1569, in retrieve
fp = self.open(url, data)
File "C:\Python32\lib\urllib\request.py", line 1541, in open
raise IOError('socket error', msg).with_traceback(sys.exc_info()[2])
File "C:\Python32\lib\urllib\request.py", line 1537, in open
return getattr(self, name)(url)
File "C:\Python32\lib\urllib\request.py", line 1715, in open_http
return self._open_generic_http(http.client.HTTPConnection, url, data)
File "C:\Python32\lib\urllib\request.py", line 1695, in _open_generic_http
http_conn.request("GET", selector, headers=headers)
File "C:\Python32\lib\http\client.py", line 967, in request
self._send_request(method, url, body, headers)
File "C:\Python32\lib\http\client.py", line 1005, in _send_request
self.endheaders(body)
File "C:\Python32\lib\http\client.py", line 963, in endheaders
self._send_output(message_body)
File "C:\Python32\lib\http\client.py", line 808, in _send_output
self.send(msg)
File "C:\Python32\lib\http\client.py", line 746, in send
self.connect()
File "C:\Python32\lib\http\client.py", line 724, in connect
self.timeout, self.source_address)
File "C:\Python32\lib\socket.py", line 386, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno 11004] getaddrinfo failed我可以打开来自print语句的url,非常好:
sdt=1,14&ct=citation&cd=0
是什么引起的?我尝试将http://更改为http:/// (三个斜杠),但引发了相同的异常。
发布于 2012-07-17 22:03:20
你的问题是:
urllib.request.urlretrieve("http://scholar.google.com" + url, filename)您要两次添加http://scholar.google.com部件(url已经启动http://scholar.google.com)。因此,urillib认为您在scholar.google.comhttp上请求一个页面--不用说,这个域不存在。你的错误就是这么说的。
显然,只需请求url即可。
今后要更快地找到这类东西的方便提示:当添加用于调试的print语句时,请确保打印正在调试的命令中使用的实际值。如果您的print语句也连接了基本URL,您将在大约两秒钟内找到这一点。
https://stackoverflow.com/questions/11531275
复制相似问题