首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用“美丽汤”从html页面获取星级。

使用“美丽汤”从html页面获取星级。
EN

Stack Overflow用户
提问于 2022-04-15 10:48:14
回答 1查看 190关注 0票数 -2

我正在尝试从这一页(https://www.edmunds.com/tesla/model-3/2019/consumer-reviews/)中获取星级评级。

我指的是安全、性能、舒适等方面。

以下是html代码的样子:

代码语言:javascript
复制
<div class="justify-content-between flex-column flex-md-row row"><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Safety</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Technology</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Performance</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Interior</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Comfort</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Reliability</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl><dl class="mb-1 d-flex justify-content-between pr-1_5 pr-sm-0 pr-md-1_5 pr-lg-0 pr-xl-2_5 col-7 col-sm-4 col-md-5"><dt class="font-weight-normal">Value</dt><dd class="mb-0"><span class="rating-stars text-primary-darker"><span class="sr-only">5 out of 5 stars</span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span><span class="rating-star icon-star-full"></span></span></dd></dl></div></div></div>

如果代码太长,我将发布屏幕截图。

下面是我使用的代码,但是当使用上述标记时,它不起作用

代码语言:javascript
复制
data = []
ua = UserAgent()
header = {'User-Agent':str(ua.safari)}
url = 'https://www.edmunds.com/tesla/model-3/2019/consumer-reviews/'
response = requests.get(url, headers=header)
html_soup = BeautifulSoup(response.text, 'lxml')
content_list = html_soup.find_all('div', attrs={'class': 'review-item'})
for e in content_list:

  d = {'review_title': e.a.text,
                'review_content': e.select_one('p').text,
                'overall_rating': e.select_one('span.sr-only').text,
                'reviewer_name':e.div.text.split(',')[0].strip(),
                'review_date':e.div.text.split(',')[1].strip(),
                 
              }

  data.append(d)
df = pd.DataFrame(data)
df1 = df.drop_duplicates(subset=['reviewer_name', 'review_title'], keep='first')

基本上,我想实现的是,为每一个明星评级列,例如安全: 5.0,性能: 5.0,舒适度: 5.0,等等。

我试图使用代码的这一部分:

代码语言:javascript
复制
d.update(dict(s.stripped_strings for s in e.select('span.rating-stars span.sr-only')))
data.append(d)

但是它不起作用。此外,包含总体星等和详细星等的标签与它的类别相同,区别在于这两个标签在不同的标签下(希望我没有把它复杂化太多)。不管怎样,我希望有人能帮我这个忙。

编辑我编辑了一段代码,因为我粘贴的代码似乎不起作用,这很奇怪

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-04-15 12:46:21

一般来说,在正确选择元素的情况下使用stripped_strings是可能的:

代码语言:javascript
复制
d.update(dict(s.stripped_strings for s in e.select('dl')))

由于您的预期输出,我建议分别为keyvalue选择字符串:

代码语言:javascript
复制
...
d.update({s.dt.text:float(s.dd.text.split()[0]) for s in e.select('dl')})

data.append(d)
...

这将使用以下内容更新您的dict

代码语言:javascript
复制
{'Safety': 5.0, 'Technology': 5.0, 'Performance': 5.0, 'Interior': 5.0, 'Comfort': 5.0, 'Reliability': 5.0, 'Value': 5.0}

或者在没有空ResultSet的情况下使用dict

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71883082

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档