我正试图从晨星网站上提取两个数据点,以获取公司名单,并将其保存到文本文件中,但我不知道如何处理这一任务。下面是我的代码:
from bs4 import BeautifulSoup as BS
thislist = ["AAPL","FB","TSLA","DIS"]
for symbol in thislist:
print ('Getting data for ' + symbol + '...\n')
# extract from this website
url="https://www.morningstar.com/stocks/xnas/" + symbol + "/quote"
soup = BS(url)
# Find the Value of Last Close Price
for text in soup.find_all('div class', name_='Last Close'):
Last_Close = text.find_all('dp-value price-down')
print(Last_Close)
# Find the Value of its Market Cap
for text in soup.find_all('div class', name_='Market Cap'):
Market_Cap = text.find_all('dp-value')
print(Market_Cap)
# Print the table
print(symbol, Last_Close, Market_Cap)
# Save the data in a .txt file
df.to_csv(r'c:\data\testing.txt', header=None, index=None, sep=' ', mode='a')发布于 2022-01-22 18:33:40
首先,这些代码将为您提供所需的信息:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
import time
symbols = ["AAPL", "FB", "TSLA", "DIS"]
def download_data(symbol):
url = f'https://www.morningstar.com/stocks/xnas/{symbol}/quote'
s = Service(ChromeDriverManager().install())
op = webdriver.ChromeOptions()
op.headless = True
driver = webdriver.Chrome(service=s, options=op)
driver.get(url)
# symbol, Last_Close, Market_Cap
time.sleep(2)
last_close = driver.find_element(by=By.XPATH,
value='//*[@id="__layout"]/div/div[2]/div[3]/main/div[2]/div/div/div[1]/div[1]/div/sal-components/section/div/div/div/sal-components-quote/div/div/div/div/div/div[2]/ul/li[1]/div/div[2]')
market_cap = driver.find_element(by=By.XPATH,
value='//*[@id="__layout"]/div/div[2]/div[3]/main/div[2]/div/div/div[1]/div[1]/div/sal-components/section/div/div/div/sal-components-quote/div/div/div/div/div/div[2]/ul/li[7]/div/div[2]')
return symbol, last_close.text, market_cap.text
for symbol in symbols:
print(download_data(symbol))输出如下所示:
('AAPL', '164.51', '2.6529 Tril')
('FB', '316.56', '843.3460 Bil')
('TSLA', '996.27', '947.9256 Bil')实际上,迪士尼的页面并不存在,所以您可能需要考虑查看网址。
您可以按需要将其保存在数据帧中,以便导出到csv。我建议用硒代替美汤。您可以消除试图查找使用Javascript动态呈现的信息所带来的麻烦。有时候美丽汤在那里会有麻烦。Selenium的行为就像您访问网页时一样。
在您的代码中,也可以尝试使用soup = BS(url)。我相信您需要使用python中的requests库来发出HTTP请求,但我已经有一段时间没有使用BS了。
发布于 2022-01-22 18:04:34
开发一个刮板从网站中提取数据将比使用更接近原始数据来源的东西更慢对真实市场条件的反应。有各种各样的库存包是非常有用的。下面是一些使用Pandas DataReader的有用链接:
https://www.mssqltips.com/sqlservertip/6826/techniques-for-collecting-stock-data-with-python/ https://towardsdatascience.com/how-to-get-stock-data-using-python-c0de1df17e75
就我个人而言,我更喜欢使用Pandas,因为它对我来说更可靠,而且我的所有数据通常都以熊猫的数据结尾。DataReader也可以直接从晨星提取:https://pandas-datareader.readthedocs.io/en/v0.6.0/readers/morningstar.html
此外,如果您对开发深度交易系统感兴趣,Quandl很适合分析历史数据。https://analyzingalpha.com/nasdaq-data-link-quandl-python-api
https://stackoverflow.com/questions/70815525
复制相似问题