我正在尝试使用Beautiful编写一个python脚本,它将刮除每种加密货币的名称和符号。尽管有超过数百个符号,但是在第10次迭代之后,没有一个会被返回。有人能帮我吗?我想要废除的网站是https://coinmarketcap.com
“守则”:
from bs4 import BeautifulSoup
import requests
import csv
source=requests.get('https://coinmarketcap.com').text
soup = BeautifulSoup(source, 'html.parser')
def scrape_data():
container = soup.find('tbody')
theData = container.find_all("tr")
for i in theData:
individual_symbol= i.find('p', attrs= {"class":"sc-1eb5slv-0 gGIpIK coin-item-symbol"})
individual_name = i.find('p', attrs= {"class":"sc-1eb5slv-0 iworPT"})
print('Name: {}, Symbol: {}'.format(individual_name.text, individual_symbol.text))
scrape_data()这个会被退回
Name: Bitcoin, Symbol: BTC
Name: Ethereum, Symbol: ETH
Name: Tether, Symbol: USDT
Name: BNB, Symbol: BNB
Name: USD Coin, Symbol: USDC
Name: XRP, Symbol: XRP
Name: Terra, Symbol: LUNA
Name: Cardano, Symbol: ADA
Name: Solana, Symbol: SOL
Name: Avalanche, Symbol: AVAX
Traceback (most recent call last):
File "/Users/ryan/Documents/PythonProjects/EODWebScrape/main.py", line 18, in <module>
scrape_data()
File "/Users/ryan/Documents/PythonProjects/EODWebScrape/main.py", line 15, in scrape_data
print(individual_symbol.text)
AttributeError: 'NoneType' object has no attribute 'text'
ryan@Ryans-MBP PythonProjects % 发布于 2022-03-09 08:53:37
数据以json格式显示在<script>标记中。我总是抱着获取完整数据的心态,然后总是能过滤掉你需要的东西。这将获得可用的全部数据:
代码:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import json
import re
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
dfs = []
for page in range(1,21):
print(f'Page: {page}')
url = f'https://coinmarketcap.com/?page={page}'
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
script = soup.find_all('script')[-1]
jsonStr = re.search('({.*})', str(script)).group(1)
jsonData = json.loads(jsonStr)
colsData = jsonData['props']['initialState']['cryptocurrency']['listingLatest']['data'][0]
cols = colsData['keysArr'] + colsData['excludeProps']
data = jsonData['props']['initialState']['cryptocurrency']['listingLatest']['data'][1:]
df = pd.DataFrame(data, columns=cols)
dfs.append(df)
df = pd.concat(dfs, axis=0)
name_symbol = df[['name','symbol']]完整数据:
print(df)
ath atl ... quotes.1.tvl quotes.2.tvl
0 68789.625939 65.526001 ... NaN NaN
1 4891.704698 0.420897 ... NaN NaN
2 1.215490 0.568314 ... NaN NaN
3 690.931965 0.096109 ... NaN NaN
4 2.349556 0.929222 ... NaN NaN
.. ... ... ... ... ...
95 0.054136 0.000109 ... NaN NaN
96 1516.640112 0.000000 ... NaN NaN
97 0.066469 0.000600 ... NaN NaN
98 0.750742 0.000201 ... NaN NaN
99 0.015614 0.000111 ... NaN NaN
[2000 rows x 153 columns]名称/符号:
print(name_symbol)
name symbol
0 Bitcoin BTC
1 Ethereum ETH
2 Tether USDT
3 BNB BNB
4 USD Coin USDC
.. ... ...
95 HYCON HYC
96 Pepemon Pepeballs PPBLZ
97 IONChain IONC
98 DecentBet DBET
99 BlitzPick XBP
[2000 rows x 2 columns]发布于 2022-03-08 10:43:32
好的,我检查了这个页面,似乎只有前10页没有JavaScript就加载了。看图像

如果您使用请求,请记住,这只适用于静态数据,而不是JS加载的数据。因此,如果没有启用JS,任何东西都不能正常工作,检查页面。
https://stackoverflow.com/questions/71393485
复制相似问题