首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >web抓取问题,尝试将信息获取到csv和图表中

web抓取问题,尝试将信息获取到csv和图表中
EN

Stack Overflow用户
提问于 2019-12-06 18:37:41
回答 2查看 43关注 0票数 0

下面是我的代码中的问题。它给了我非常完整的信息。我正在搜集我最喜欢的10家太空科技公司的股票价格。我想要得到10小时内的股票价格,或者我可能只运行代码10次。我不能使用api的。这是为了一个学校项目。然后,我想使用matplotlib将所有数据组合成十个大图表,以显示这些股票价格。或者每只股票都有十张图表。我想使用这种类型的图表。

任何建议都是很棒的。下面是我当前的代码:

代码语言:javascript
复制
#import libraries
import pandas as pd

#scraping my top ten favorite space companies, attempted to pick compaines with pure play interest in space
urls = ['https://finance.yahoo.com/quote/GILT/', 'https://finance.yahoo.com/quote/LORL?p=LORL&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/I?p=I&.tsrc=fin-srch' , 'https://finance.yahoo.com/quote/VSAT?p=VSAT&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/RTN?p=RTN&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/UTX?ltr=1', 'https://finance.yahoo.com/quote/TDY?ltr=1', 'https://finance.yahoo.com/quote/ORBC?ltr=1', 'https://finance.yahoo.com/quote/SPCE?p=SPCE&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/BA?p=BA&.tsrc=fin-srch',]

def  parsePrice(r):
    df = pd.read_html(r)[0].T
    cols = list(df.iloc[0,:])
    temp_df = pd.DataFrame([list(df.iloc[1,:])], columns=cols)
    temp_df['url'] = r
    return temp_df
df = pd.DataFrame()
for r in urls:
   df = df.append(parsePrice(r), sort=True).reset_index(drop=True)
df.to_csv('C:/Users/n_gor/Desktop/webscape/Nicholas Final Projects/spacestocklisting.csv', index=False)
print (df.to_string())

CSV文件输出:

代码语言:javascript
复制
     52 Week Range            Ask Avg. Volume           Bid      Day's Range    Open Previous Close   Volume                                                url
0      7.32 - 9.87     8.09 x 800       23415    8.06 x 800      8.01 - 8.11    8.10           8.01     6337              https://finance.yahoo.com/quote/GILT/
1    32.14 - 42.77   32.74 x 1100       41759  32.59 x 1000    32.28 - 32.75   32.32          32.28    14685  https://finance.yahoo.com/quote/LORL?p=LORL&.t...
2     5.55 - 27.29     6.64 x 800     5746553   6.63 x 2900      6.51 - 6.68    6.64           6.65   995245  https://finance.yahoo.com/quote/I?p=I&.tsrc=fi...
3    55.93 - 97.31    72.21 x 800      281600  72.16 x 1000    71.51 - 72.80   72.26          72.32    74758  https://finance.yahoo.com/quote/VSAT?p=VSAT&.t...
4  144.27 - 220.03  215.54 x 1000     1560562  215.37 x 800  214.87 - 217.45  215.85         214.86   203957  https://finance.yahoo.com/quote/RTN?p=RTN&.tsr...
5  100.48 - 149.81   145.03 x 800     2749725  144.96 x 800  144.41 - 145.56  145.49         144.52   489169          https://finance.yahoo.com/quote/UTX?ltr=1
6  189.35 - 351.53   343.34 x 800      280325  342.80 x 800  342.84 - 346.29  344.16         343.58    42326          https://finance.yahoo.com/quote/TDY?ltr=1
7  3.5800 - 9.7900  4.1400 x 1300      778343  4.1300 x 800  4.1200 - 4.2000  4.1700         4.1500    62335         https://finance.yahoo.com/quote/ORBC?ltr=1
8     6.90 - 12.09     7.37 x 900     2280333    7.38 x 800      7.24 - 7.48    7.30           7.22   539082  https://finance.yahoo.com/quote/SPCE?p=SPCE&.t...
9  292.47 - 446.01   348.73 x 800     4420225  348.79 x 800  345.70 - 350.42  350.22         348.84  1258813  https://finance.yahoo.com/quote/BA?p=BA&.tsrc=...

我能把股票名称加到这上面吗?对如何完成这个项目有什么建议吗?我有点迷路了。

EN

回答 2

Stack Overflow用户

发布于 2019-12-06 20:08:06

只需要解析title header:

代码语言:javascript
复制
#import libraries
import pandas as pd
import requests
from bs4 import BeautifulSoup

#scraping my top ten favorite space companies, attempted to pick compaines with pure play interest in space
urls = ['https://finance.yahoo.com/quote/GILT/', 'https://finance.yahoo.com/quote/LORL?p=LORL&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/I?p=I&.tsrc=fin-srch' , 'https://finance.yahoo.com/quote/VSAT?p=VSAT&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/RTN?p=RTN&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/UTX?ltr=1', 'https://finance.yahoo.com/quote/TDY?ltr=1', 'https://finance.yahoo.com/quote/ORBC?ltr=1', 'https://finance.yahoo.com/quote/SPCE?p=SPCE&.tsrc=fin-srch', 'https://finance.yahoo.com/quote/BA?p=BA&.tsrc=fin-srch',]

def  parsePrice(r):
    response = requests.get(r)
    soup = BeautifulSoup(response.text,  'html.parser')
    titleHeader = soup.find('div', {'id':'quote-header-info'})
    title = titleHeader.find('h1').text
    comp = title.split('-')[-1].strip()
    abr = title.split('-')[0].strip()


    print (title)

    df = pd.read_html(response.text)[0].T
    cols = list(df.iloc[0,:])
    temp_df = pd.DataFrame([list(df.iloc[1,:])], columns=cols)
    temp_df['url'] = r
    temp_df['company name'] = comp
    temp_df['stock name'] = abr
    return temp_df

df = pd.DataFrame()

for r in urls:
   df = df.append(parsePrice(r), sort=True).reset_index(drop=True)


df.to_csv('C:/Users/n_gor/Desktop/webscape/Nicholas Final Projects/spacestocklisting.csv', index=False)
print (df.to_string())
票数 2
EN

Stack Overflow用户

发布于 2019-12-06 19:27:18

您可以使用pandas.DataFrame.insert

如果你有一个列表中的所有股票名称,

代码语言:javascript
复制
stock_names = ['GILT', 'LORL', 'I', 'VSAT', 'RTN', 'UTX', 'TDY', 'ORBC', 'SPCE', 'BA']
# insert to the begining(column at index 0) of the dataFrame 
df.insert(0, "column_heading", stock_names) 

或者,您可以使用正则表达式从urls获取所有股票名称,并将其添加到df中

代码语言:javascript
复制
import re
stock_names= [re.findall('[A-Z]+',x)[0] for x in urls]
# insert to the begining(column at index 0) of the dataFrame 
df.insert(0, "column_heading", stock_names)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59211481

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档