首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >漂亮的Soup嵌套标记搜索在第10次搜索后不返回

漂亮的Soup嵌套标记搜索在第10次搜索后不返回
EN

Stack Overflow用户
提问于 2022-03-08 10:31:30
回答 2查看 62关注 0票数 -1

我正在尝试使用Beautiful编写一个python脚本,它将刮除每种加密货币的名称和符号。尽管有超过数百个符号,但是在第10次迭代之后,没有一个会被返回。有人能帮我吗?我想要废除的网站是https://coinmarketcap.com

“守则”:

代码语言:javascript
复制
from bs4 import BeautifulSoup
import requests
import csv

source=requests.get('https://coinmarketcap.com').text

soup = BeautifulSoup(source, 'html.parser')

def scrape_data():
    container = soup.find('tbody')
    theData = container.find_all("tr")
    for i in theData:
        individual_symbol= i.find('p', attrs= {"class":"sc-1eb5slv-0 gGIpIK coin-item-symbol"})
        individual_name = i.find('p', attrs= {"class":"sc-1eb5slv-0 iworPT"})
        print('Name: {}, Symbol: {}'.format(individual_name.text, individual_symbol.text))

scrape_data()

这个会被退回

代码语言:javascript
复制
Name: Bitcoin, Symbol: BTC
Name: Ethereum, Symbol: ETH
Name: Tether, Symbol: USDT
Name: BNB, Symbol: BNB
Name: USD Coin, Symbol: USDC
Name: XRP, Symbol: XRP
Name: Terra, Symbol: LUNA
Name: Cardano, Symbol: ADA
Name: Solana, Symbol: SOL
Name: Avalanche, Symbol: AVAX
Traceback (most recent call last):
  File "/Users/ryan/Documents/PythonProjects/EODWebScrape/main.py", line 18, in <module>
    scrape_data()
  File "/Users/ryan/Documents/PythonProjects/EODWebScrape/main.py", line 15, in scrape_data
    print(individual_symbol.text)
AttributeError: 'NoneType' object has no attribute 'text'
ryan@Ryans-MBP PythonProjects % 
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-03-09 08:53:37

数据以json格式显示在<script>标记中。我总是抱着获取完整数据的心态,然后总是能过滤掉你需要的东西。这将获得可用的全部数据:

代码:

代码语言:javascript
复制
import pandas as pd
import requests
from bs4 import BeautifulSoup
import json
import re

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}

dfs = []
for page in range(1,21):
    print(f'Page: {page}')
    url = f'https://coinmarketcap.com/?page={page}'
    response = requests.get(url, headers=headers)
    
    soup = BeautifulSoup(response.text, 'html.parser')
    script = soup.find_all('script')[-1]
    
    jsonStr = re.search('({.*})', str(script)).group(1)
    jsonData = json.loads(jsonStr)
    
    colsData = jsonData['props']['initialState']['cryptocurrency']['listingLatest']['data'][0]
    cols = colsData['keysArr'] + colsData['excludeProps']
    data = jsonData['props']['initialState']['cryptocurrency']['listingLatest']['data'][1:]
    
    df = pd.DataFrame(data, columns=cols)
    dfs.append(df)
    
df = pd.concat(dfs, axis=0)


name_symbol = df[['name','symbol']]

完整数据:

代码语言:javascript
复制
print(df)
             ath        atl  ...  quotes.1.tvl  quotes.2.tvl
0   68789.625939  65.526001  ...           NaN           NaN
1    4891.704698   0.420897  ...           NaN           NaN
2       1.215490   0.568314  ...           NaN           NaN
3     690.931965   0.096109  ...           NaN           NaN
4       2.349556   0.929222  ...           NaN           NaN
..           ...        ...  ...           ...           ...
95      0.054136   0.000109  ...           NaN           NaN
96   1516.640112   0.000000  ...           NaN           NaN
97      0.066469   0.000600  ...           NaN           NaN
98      0.750742   0.000201  ...           NaN           NaN
99      0.015614   0.000111  ...           NaN           NaN

[2000 rows x 153 columns]

名称/符号:

代码语言:javascript
复制
print(name_symbol)
                 name symbol
0             Bitcoin    BTC
1            Ethereum    ETH
2              Tether   USDT
3                 BNB    BNB
4            USD Coin   USDC
..                ...    ...
95              HYCON    HYC
96  Pepemon Pepeballs  PPBLZ
97           IONChain   IONC
98          DecentBet   DBET
99          BlitzPick    XBP

[2000 rows x 2 columns]
票数 0
EN

Stack Overflow用户

发布于 2022-03-08 10:43:32

好的,我检查了这个页面,似乎只有前10页没有JavaScript就加载了。看图像

如果您使用请求,请记住,这只适用于静态数据,而不是JS加载的数据。因此,如果没有启用JS,任何东西都不能正常工作,检查页面。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71393485

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档