首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Python抓取具有完全相同的类信息的信息

使用Python抓取具有完全相同的类信息的信息
EN

Stack Overflow用户
提问于 2020-06-13 04:22:20
回答 2查看 103关注 0票数 0

我正在使用BeautifulSoup从这个网站https://www.gurufocus.com/insider/summary中抓取信息

有两个价格列,它们的价格值不同,但它们的类和元素完全相同。下面是类信息:

代码语言:javascript
复制
<td data-v-575fbbfb="" class="right-align number-field" data-column="Price" row-idx="0">
<span style="color: ">$2.12</span></td>

这是我代码的一部分

代码语言:javascript
复制
from bs4 import BeautifulSoup
import requests
import pandas as pd 
price = []
for pr in soup.find_all('td',{'class': 'right-align number-field','data-column': 'Price'}):
    price.append(pr.text)

有谁知道如何区分这两种价格,并将它们拼凑成两个数组?

EN

回答 2

Stack Overflow用户

发布于 2020-06-13 04:53:22

你可以使用zip()内置函数来完成任务。

例如:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup


url = 'https://www.gurufocus.com/insider/summary'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

tds = soup.select('td[data-column="Price"]')

price_column_1, price_column_2 = [], []
for td_col1, td_col2 in zip(tds[::2], tds[1::2]):
    price_column_1.append(td_col1.text)
    price_column_2.append(td_col2.text)

# print to screen
for p1, p2 in zip(price_column_1, price_column_2):
    print('{:<10}{}'.format(p1, p2))

打印:

代码语言:javascript
复制
$2.05     $2.12
$15.42    $14.79
$0.02     $0.02
$0.64     $0.63
$73.13    $76.89
$298.75   $308.05
$512.74   $517.77
$341.27   $357
$300.99   $311.13
$38.34    $39.02
$20.79    $21.72
$5.65     $5.37
$14.30    $14.43
$37.93    $36.24
$174.90   $177.79
$79.58    $83.49
$79.58    $83.49
$63.91    $66.56
$25.31    $25.90
$93.04    $95.37
$67.73    $72.59
$67.73    $71.59
$67.71    $71.55
$11.31    $10.93
$58.67    $60.62
$22.64    $25.21
$3.98     $4.01
$6.47     $6.25
$9.08     $8.84
$23.69    $23.79
$174.23   $178.10
$100.07   $99.75
$11.89    $12.01
$0.83     $0.83
$41.15    $25
$41.15    $25
$41.15    $25
$7.23     $4.73
$23.04    $21.27
$37.97    $35.57
票数 0
EN

Stack Overflow用户

发布于 2020-06-13 05:01:05

您还可以使用pandas直接获取该表,并使用列名:

代码语言:javascript
复制
import pandas as pd
import requests

r = requests.get("https://www.gurufocus.com/insider/summary")

data = pd.read_html(r.text, attrs = {'class': 'data-table'})[0]

data.columns = [
    'Ticker', 'Links', 'Company', 'Price1', 'Insider Name', 'Insider Position', 
    'Date', 'Buy/Sell', 'Insider Trading Shares', 'Shares Change', 'Price2', 
    'Cost(000)', 'Final Share', 'Price Change Since Insider Trade (%)', 
    'Dividend Yield %', 'PE Ratio', 'Market Cap ($M)', 'None'
]

print(data[["Price1","Price2"]])

输出:

代码语言:javascript
复制
     Price1   Price2
0     $2.05    $2.12
1    $15.42   $14.79
2     $0.02    $0.02
3     $0.64    $0.63
4    $73.13   $76.89
5   $298.75  $308.05
6   $512.74  $517.77
7   $341.27     $357
8   $300.99  $311.13
9    $38.34   $39.02
10   $20.79   $21.72
11    $5.65    $5.37
12   $14.30   $14.43
13   $37.93   $36.24
14  $174.90  $177.79
15   $79.58   $83.49
16   $79.58   $83.49
17   $63.91   $66.56
18   $25.31   $25.90
19   $93.04   $95.37
20   $67.73   $72.59
21   $67.73   $71.59
22   $67.71   $71.55
23   $11.31   $10.93
24   $58.67   $60.62
25   $22.64   $25.21
26    $3.98    $4.01
27    $6.47    $6.25
28    $9.08    $8.84
29   $23.69   $23.79
30  $174.23  $178.10
31  $100.07   $99.75
32   $11.89   $12.01
33    $0.83    $0.83
34   $41.15      $25
35   $41.15      $25
36   $41.15      $25
37    $7.23    $4.73
38   $23.04   $21.27
39   $37.97   $35.57
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62351936

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档