首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Python刮刀不为TripAdvisor工作

Python刮刀不为TripAdvisor工作
EN

Stack Overflow用户
提问于 2022-04-20 09:09:47
回答 1查看 117关注 0票数 0

我正在尝试编写一个简单的Python刮刀,以保存对TripAdvisor上特定位置的所有评论。

我作为示例使用的具体链接如下:

https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html

下面是我使用的代码,它应该打印相对的html

代码语言:javascript
复制
from bs4 import BeautifulSoup
import requests

url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"

r = requests.get(url)
data = r.text
soup = BeautifulSoup(data)
print(soup)

如果我在控制台中运行这段代码,它将在requests.get(url)上挂起很长时间,不会有任何输出。使用另一个url (例如url = "https://stackoverflow.com/"),我立即得到正确显示的html。为什么TripAdvisor不能工作?我如何才能获得它的html?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-04-20 09:22:18

添加user-agent应该可以在第一步解决您的问题,因为有些站点提供了不同的内容,或者将其用于bot /自动检测--在浏览器中打开DEVTools --从您的请求中复制用户代理:

代码语言:javascript
复制
headers = {'User-Agent': 'Mozilla/5.0'}
r = requests.get(url,headers=headers)

示例

代码语言:javascript
复制
from bs4 import BeautifulSoup
import requests

url = "https://www.tripadvisor.com/Attraction_Review-g319796-d5988326-Reviews-or50-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html"
headers = {'User-Agent': 'Mozilla/5.0'}

r = requests.get(url,headers=headers)
data = r.text
soup = BeautifulSoup(data)
data = []

for e in soup.select('#tab-data-qa-reviews-0 [data-automation="reviewCard"]'):
    data.append({
        'rating':e.select_one('svg[aria-label]')['aria-label'],
        'profilUrl':e.select_one('a[tabindex="0"]').get('href'),
        'content':e.select_one('div:has(>a[tabindex="0"]) + div + div').text
    })

data

输出

代码语言:javascript
复制
[{'rating': '5.0 of 5 bubbles',
  'profilUrl': '/ShowUserReviews-g319796-d5988326-r620396152-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html',
  'content': "We were fortunate to get in without pre-booking.What a find. A UNESCO site in the middle of the countryside.The replication cave is so awesome and authentic, hard to believe it's not the real thing.The museum is beautifully curated, great for students, and anyone interested in archeology and the beginnings of human existence.Definitely worth visiting. We nearly missed out Read more"},
 {'rating': '5.0 of 5 bubbles',
  'profilUrl': '/ShowUserReviews-g319796-d5988326-r618358203-Museo_de_Altamira-Santillana_del_Mar_Cantabria.html',
  'content': 'Beautiful site with great replica’s of the original cave, excellent exposition, poor film as an introduction however!The most urgent issue: long waiting because you need a slot to enter. This could be done 1000% better and in every decent museum it is done better! Staff probably civil servants with no great desire to make you enjoy the visit. Building urgently needs a revamp, no exposure at all!Read more'},...]
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71937012

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档