首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用<span> </span>刮取BeautifulSoup "text“

使用<span> </span>刮取BeautifulSoup "text“
EN

Stack Overflow用户
提问于 2022-11-14 04:13:42
回答 1查看 33关注 0票数 0

我正在使用BeautifulSoup从一个网站上抓取数据。无论出于什么原因,我似乎无法找到一种方法来打印span元素之间的文本。这是我要跑的东西。

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup
import pandas as pd

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
url = 'https://www.amazon.com/GymCope-Anti-Tear-Cushioning-Non-Slip-Exercise/dp/B0921F1T2P/ref=sr_1_3_sspa?brr=1&pd_rd_r=4b40f0a8-f2d8-44dc-9a98-413c64d3fa34&pd_rd_w=P9ZJI&pd_rd_wg=RS7zW&pf_rd_p=9875e817-188b-48a2-986d-8146749644ac&pf_rd_r=AGWBT5KT04TYKGPZASKA&qid=1642452438&rd=1&rnid=3407731&s=sporting-goods&sr=1-3-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExVjhWTk0xQU5WWldPJmVuY3J5cHRlZElkPUEwODE0MzYwMTdMTDZSNDVST08yMiZlbmNyeXB0ZWRBZElkPUEwODQ4MDM0MlE4WEtVUjFKMUdLMiZ3aWRnZXROYW1lPXNwX2F0Zl9icm93c2UmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl'
response = requests.get(url, headers=headers)
html = response.text
soup = BeautifulSoup(html)
bsr = soup.find("div", class_="a-section table-padding").text

看到这个,

代码语言:javascript
复制
>>> bsr
'    ASIN   B0921F1T2P    Customer Reviews  \n\n \n  4.6 out of 5 stars    \n    41 ratings   \n\n\n 4.6 out of 5 stars     Best Sellers Rank    #69,660 in Sports & Outdoors (See Top 100 in Sports & Outdoors)  #234 in Yoga Mats       Date First Available   April 8, 2021    '

我试过了

代码语言:javascript
复制
bsra = soup.find("div", class_="a-section table-padding").find_next('span').get_text()

但它出来了

代码语言:javascript
复制
> > > bsr
> > > '\\n  4.6 out of 5 stars    '

我只想刮“最佳卖家排名”的图片。谢谢。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-11-14 06:09:41

您的问题中缺少参考图片,但是您可以通过选择更具体的元素来获得排名:

代码语言:javascript
复制
soup.select_one('th:-soup-contains("Best Sellers Rank") + td').text.split()[0]

示例

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0", "Accept-Encoding":"gzip, deflate", "Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "DNT":"1","Connection":"close", "Upgrade-Insecure-Requests":"1"}
url = 'https://www.amazon.com/GymCope-Anti-Tear-Cushioning-Non-Slip-Exercise/dp/B0921F1T2P/ref=sr_1_3_sspa?brr=1&pd_rd_r=4b40f0a8-f2d8-44dc-9a98-413c64d3fa34&pd_rd_w=P9ZJI&pd_rd_wg=RS7zW&pf_rd_p=9875e817-188b-48a2-986d-8146749644ac&pf_rd_r=AGWBT5KT04TYKGPZASKA&qid=1642452438&rd=1&rnid=3407731&s=sporting-goods&sr=1-3-spons&psc=1&spLa=ZW5jcnlwdGVkUXVhbGlmaWVyPUExVjhWTk0xQU5WWldPJmVuY3J5cHRlZElkPUEwODE0MzYwMTdMTDZSNDVST08yMiZlbmNyeXB0ZWRBZElkPUEwODQ4MDM0MlE4WEtVUjFKMUdLMiZ3aWRnZXROYW1lPXNwX2F0Zl9icm93c2UmYWN0aW9uPWNsaWNrUmVkaXJlY3QmZG9Ob3RMb2dDbGljaz10cnVl'
response = requests.get(url, headers=headers)
html = response.text
soup = BeautifulSoup(html)

soup.select_one('th:-soup-contains("Best Sellers Rank") + td').text.split()[0]

输出

代码语言:javascript
复制
#84,712
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74426872

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档