from bs4 import BeautifulSoup
import requests
url = 'https://www.kayak.co.uk/flights/SEL-LON/2020-12-31?sort=bestflight_a'
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
deptimes = soup.find_all('span', attrs={'class': 'depart-time base-time'})
deptimes在Kayak,我试图获得从首尔飞往伦敦的航班的起飞时间信息。结果是[],我正在尝试其他信息的格式,但是结果总是一样的。谢谢
发布于 2020-09-06 17:28:46
如果您查看您作为一个response.content返回的内容,kayak.co.uk (正确地)认为您是一个机器人,并发送这个(以及其他东西)。
如果你看到这个页面,它意味着皮艇认为你是一个“机器人”,而你想要到达的页面只对人类有用。
但是,只要稍微修改一下代码,就足以使服务器返回您想要的内容。
试试这个:
import requests
from bs4 import BeautifulSoup
url = 'https://www.kayak.co.uk/flights/SEL-LON/2020-12-31?sort=bestflight_a'
headers = {
"accept": "application/json, text/javascript, */*; q=0.01",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-US,en;q=0.9,pl;q=0.8",
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36"
}
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
departure_times = soup.find_all('span', {'class': 'depart-time base-time'})
for _time in departure_times:
print(_time.text)这一产出如下:
00:25
00:25
00:25
14:30
14:30
13:00
13:00
... and so onhttps://stackoverflow.com/questions/63765721
复制相似问题