文章/答案/技术大牛

发布

社区首页 >问答首页 >使用bs4从<td class=“无包装货币名称”数据排序=“Metaverse”>“中查找名称

问使用bs4从<td class=“无包装货币名称”数据排序=“Metaverse”>“中查找名称
EN

Stack Overflow用户

提问于 2019-05-20 04:31:23

回答 2查看 320关注 0票数 1

首先，我对html知之甚少，我认为这是我的问题。在我的搜索中，我很难找到一个特定的硬币名。我不知道是否使用td标签找到名称，或者可能有更好的方法。

在收集这个备份之前，我会搜索一个特定的部分，但是当更新出现时，它会移动名称和价格，所以它肯定不是理想的，但在这段时间内起了作用。我回到了它，试图找到一个方法来寻找硬币的名字，而不是它的位置。

def loadPageCM():
     # Grabbing url with requests
     page = requests.get('https://www.coinmarketcap.com')

     # Sending page to Bs4 to parse info
     soup = bs4(page.text, 'html.parser')

     divs = soup.findAll('table', id='currencies')

     content = []
     # finds all div tags and loops through them
     for div in divs:
         rows = div.findAll('tr')
         for row in rows:
         # looping through all the row in the singular div
         # appending to content array and removing the ending portion
         content.append(row.text.replace('\n', '')[:-115])

这是我使用的原始代码。对不起，我是新来的。

我现在要做的是根据这些硬币的名字找到它们。从这个标签上。

td class=“无包装货币名称”数据排序=“硬币”

如果有更好的办法，我可以接受任何建议。再次道歉，如果问题没有任何意义，或任何改进询问这里或我的代码在一般情况下是非常感谢的。谢谢您抽时间见我。

python-3.x

web-scraping

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-05-20 05:32:13

你走在正确的轨道上。因为您知道您想要的标记的属性，所以使用标记的soup.find_all()从吸引人获取它们。

TL；博士

# Grabbing url with requests
page = requests.get('https://www.coinmarketcap.com')

# Sending page to Bs4 to parse info
soup = BeautifulSoup(page.text, 'html.parser')

tds = soup.find_all('td', attrs={'class': 'no-wrap currency-name'})

for td in tds:
    print(td['data-sort'])   # change to get whichever attributes you want

说明：soup.find_all('td', attrs={'class': 'no-wrap currency-name'})将从页面中返回所有100个名称(行)。

然后，对于每个td (行)，我们访问我们想要的属性。例如，在第一行，<td class="no-wrap currency-name" data-sort="Bitcoin">，td.attrs显示了所有可用的属性：{'class': ['no-wrap', 'currency-name'], 'data-sort': 'Bitcoin'}。因此，要只获取硬币的名称属性，请使用td['data-sort']获取名称Bitcoin。

如果希望从行中获得更多信息，如Market Cap、Price或Volume，则对其他tds：<td class="no-wrap market-cap text-right"执行相同的技术，并使用类似字典的对这些属性的访问。

希望这能有所帮助。

票数 0

Stack Overflow用户

发布于 2019-05-20 05:37:52

您可以使用属性=值选择器通过data-sort值瞄准特定的硬币，例如Bitcoin

soup.select_one("[data-sort='Bitcoin']")

并假设您希望隔离该行，以便获得其所有关联值:使用bs4 4.7.1。可以使用:has隔离具有上述数据排序的行。

row = soup.select_one("tr:has([data-sort='Bitcoin'])")

当观察特定的硬币价值时，最后一部分的例子

from bs4 import BeautifulSoup as bs
import requests
import re

r = requests.get('https://coinmarketcap.com/')
soup = bs(r.content, 'lxml')
row = soup.select_one("tr:has([data-sort='Bitcoin'])")
print([re.sub(r'\n+' , ' ' ,item.text.strip()) for item in row.select('td')])

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56214345

复制

相似问题

问使用bs4从<td class=“无包装货币名称”数据排序=“Metaverse”>“中查找名称
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用bs4从<td class=“无包装货币名称”数据排序=“Metaverse”>“中查找名称EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用bs4从<td class=“无包装货币名称”数据排序=“Metaverse”>“中查找名称
EN