有没有人放弃过(例如dataframe) roic.ai提供的财务报表?
页面的源代码是非常嵌套的,获得语句并不简单:
目标是从HTML (而不是元素中的#__NEXT_DATA__"源)获取值。
from gazpacho import get, Soup
ticker = 'aapl'
url = f'https://roic.ai/financials/{ticker}?fs=annual'
print(url)
html = get(url)
soup = Soup(html)
soup.find('div', {'class', "flex-col"})发布于 2022-10-16 16:30:21
from gazpacho import Soup
import json
import pandas as pd
ticker = 'aapl'
url = f'https://roic.ai/financials/{ticker}?fs=annual'
soup = Soup.get(url)
scrapped_data = soup.find('script', {'id': "__NEXT_DATA__"})
data = json.loads(scrapped_data.text)
df = pd.DataFrame(data["props"]["pageProps"]["data"]["data"]["bsq"])
print(df.head())它可以这样实现。不要忘记导入熊猫和JSON库。
https://stackoverflow.com/questions/74088485
复制相似问题