我尝试在Tripadvisor上抓取“要做的事情”(例如,链接是Texas.html)。但我坚持了最初的几个代码。我等了十多分钟,没有回应。三天前我试过代码和链接,它成功了。但现在,它什么也没有产生。守则是:
import requests
trip = 'https://www.tripadvisor.com/Tourism-g30196-Austin_Texas-Vacations.html'
response = requests.get(trip)
print(type(response))我不知道这是怎么回事。期待您的帮助!非常感谢。
发布于 2022-02-19 05:26:17
首先,您应该尝试从真正的web浏览器中设置头部User-Agent (一开始可以尝试更短的Mozilla/5.0),因为requests发送类似于python/3.8 requests/2.x的内容,而服务器可以识别脚本并阻止它。有些服务器也需要这样才能为不同的浏览器或设备(桌面、平板电脑、电话)发送不同的内容。
import requests
from bs4 import BeautifulSoup
#url = 'https://www.tripadvisor.com/Tourism-g30196-Austin_Texas-Vacations.html'
url = 'https://www.tripadvisor.com/Attractions-g30196-Activities-c57-Austin_Texas.html'
response = requests.get(url, headers={'User-Agent': "Mozilla/5.0"})
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.find_all('span', {'name': 'title'})
for i in items:
print(i.text)结果:
1. Lady Bird Lake Hike-and-Bike Trail
2. Barton Springs Pool
3. Mount Bonnell
4. Congress Avenue Bridge / Austin Bats
5. Lady Bird Johnson Wildflower Center
6. Austin Aquarium
7. Zilker Metropolitan Park
8. McKinney Falls State Park
9. Barton Creek Greenbelt
10. Austin Zoo
11. Mayfield Park
12. Zilker Botanical Garden
13. Town Lake
14. Westcave Outdoor Discovery Center
15. Bull Creek District Park
16. Austin Nature & Science Center
17. Turkey Creek Trail
18. River Place Nature Trails
19. Mueller Lake Park
20. Zilker Playground
21. Deep Eddy Pool
22. Red Bud Isle Park
23. Mansfield Dam Park
24. Pease Park
25. Wild Basin Preserve
26. Emma Long Metropolitan Park
27. Shoal Creek Greenbelt
28. Commons Ford Ranch
29. Hornsby Bend Bird Observatory
30. Mary Moore Searight Metropolitan Park编辑:
在我的GitHub 刮擦中,您可以从堆栈溢出的其他答案中找到代码,它使用selenium和scrapy刮取tripadvisor。
https://stackoverflow.com/questions/71181932
复制相似问题