import urllib.request
from bs4 import BeautifulSoup
page = urllib.request.urlopen("https://www.google.com/search?sxsrf=ACYBGNTOhiadhX5wH-HLBzUmxJSBAPzpbQ%3A1574342044444&source=hp&ei=nI3WXbq4GMWGoASf-I2oAw&q=%EB%A6%AC%EB%B2%84%ED%92%80+&oq=%EB%A6%AC%EB%B2%84%ED%92%80+&gs_l=psy-ab.3..35i39j0l9.463.2481..2802...2.0..1.124.1086.0j10......0....1..gws-wiz.....10..0i131j0i10j35i362i39.ciJHtFLjhCA&ved=0ahUKEwi69r6SsfvlAhVFA4gKHR98AzUQ4dUDCAY&uact=5#sie=t;/m/04ltf;2;/m/02_tc;mt;fp;1;;").read()
soup = BeautifulSoup(page,'html.parser')我试图从谷歌那里得到一个足球比赛时间表,这个错误就发生了。理由是什么呢?
rank = soup.find('table',{'class':'imspo_mt__mit'})
print(rank)urllib.error.HTTPError: HTTP错误403:禁忌
发布于 2019-11-21 15:02:24
谷歌阻止了你访问网页,这就是403错误的原因。
试着欺骗用户代理?以下几点对我来说是可行的:
import requests
from bs4 import BeautifulSoup
user_agent = {'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'}
page = requests.get("https://www.google.com/search?sxsrf=ACYBGNTOhiadhX5wH-HLBzUmxJSBAPzpbQ%3A1574342044444&source=hp&ei=nI3WXbq4GMWGoASf-I2oAw&q=%EB%A6%AC%EB%B2%84%ED%92%80+&oq=%EB%A6%AC%EB%B2%84%ED%92%80+&gs_l=psy-ab.3..35i39j0l9.463.2481..2802...2.0..1.124.1086.0j10......0....1..gws-wiz.....10..0i131j0i10j35i362i39.ciJHtFLjhCA&ved=0ahUKEwi69r6SsfvlAhVFA4gKHR98AzUQ4dUDCAY&uact=5#sie=t;/m/04ltf;2;/m/02_tc;mt;fp;1;;", headers=user_agent)
soup = BeautifulSoup(page.text,'html.parser')
rank = soup.find('table',{'class':'imspo_mt__mit'})
print(rank)https://stackoverflow.com/questions/58976850
复制相似问题