运行此脚本时会获得一个错误:
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
url = "http://nytimes.com,http://nytimes.com"
urls = [url] #stack of urls to scrape
visited = [url] #historic record of urls
while len(urls) >0:
try:
htmltext = urllib.request.urlopen(urls[0]).read()
except:
print(htmltext)原始枕木:
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
url = "http://nytimes.com,http://nytimes.com"
urls = [url] #stack of urls to scrape
visited = [url] #historic record of urls
while len(urls) >0:
try:
htmltext = urllib.request.urlopen(urls[0]).read()
except:
print(urls[0])
soup = BeautifulSoup(htmltext)
urls.pop(0)
print (soup.findAll('a',href=True))错误:
socket.gaierror: Errno -2名称或服务未知 urllib.error.URLError: urlopen错误Errno -2名称或服务未知 回溯(最近一次调用): NameError:未定义名称“htmltext”
发布于 2014-10-26 19:03:48
如果urllib.request.urlopen()引发异常,则htmltext永远不会得到一个值(因此在except中打印该值将无法工作)。
至于为什么urlopen()不能工作,请确保您正在传递一个有效的URL。
https://stackoverflow.com/questions/26576633
复制相似问题