首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用bs抓取href只返回第一个链接。

使用bs抓取href只返回第一个链接。
EN

Stack Overflow用户
提问于 2022-02-15 10:07:16
回答 2查看 94关注 0票数 0

我试图使用bs刮表,在其中一个列上,可以有多个链接或href,如下面的示例。

代码语言:javascript
复制
<td class="column-6">
    <a href="https://smallcaps.com.au/andean-mining-ipo-colombia-exploration-high-grade-copper-gold-target/" rel="noopener noreferrer" target="_blank">Article</a> / 
    <a href="https://www.youtube.com/watch?v=Kgew7tuLWCg" rel="noopener noreferrer" target="_blank">Video</a> / 
    <a href="https://andeanmining.com.au/" rel="noopener noreferrer" target="_blank">Website</a></td>

我使用下面的代码来定位它们,但是这只返回第一个href,并且对于有多个href的行不返回任何其他的。

代码语言:javascript
复制
from time import sleep
import numpy as np
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 3000)
from bs4 import BeautifulSoup

# Scrape the smallcaps website for IPO Information and save into dataframe
smallcaps_URL = "https://smallcaps.com.au/upcoming-ipos/"

service = Service("C:\Development\chromedriver_win32\chromedriver.exe")
chrome_options = webdriver.ChromeOptions()
driver = webdriver.Chrome(service=service)

driver.get(smallcaps_URL)
sleep(3)
close_popup = driver.find_element(By.CLASS_NAME, "tve_ea_thrive_leads_form_close")
close_popup.click()

html = driver.page_source
soup = BeautifulSoup(html, 'html.parser')

all_ipo_header = soup.find_all("th")
all_ipo_content = soup.find_all("td")

ipo_headers = []
ipo_contents = []

for header in all_ipo_header:
    ipo_headers.append(header.text.replace(" ", "_"))

for content in all_ipo_content:
    if content.a:
        a = content.find('a', href=True);
        ipo_contents.append(a['href'])
    else:
        ipo_contents.append(content.text)

# Prints complete scraped dataframe from SmallCaps website
df = pd.DataFrame(np.reshape(ipo_contents, (-1, 6)), columns=ipo_headers)
print(df)

# Next thing to do is scrape a few other websites for comparison and remove duplicates.

电流输出

代码语言:javascript
复制
                     Company_name ASX_code Issue_price  Raise                     Focus                                        Information
0              Allup Silica (TBA)      APS       $0.20    $5m               Silica sand                           https://allupsilica.com/
1          Andean Mining (14 Feb)      ADM       $0.20    $6m       Mineral exploration  https://smallcaps.com.au/andean-mining-ipo-col...
2       Catalano Seafood (24 Feb)      CSF       $0.20    $6m                   Seafood                      https://www.catalanos.net.au/
3     Dragonfly Biosciences (TBA)      DRF       $0.20   $11m           Cannabidiol oil                  https://dragonflybiosciences.com/
4     Equity Story Group (18 Mar)      EQS       $0.20  $5.5m  Market advice & research                        https://equitystory.com.au/
5             Far East Gold (TBA)      FEG       $0.20   $12m       Mineral exploration  https://smallcaps.com.au/far-east-gold-asx-ipo...
6        Killi Resources (10 Feb)      KLI       $0.20    $6m           Gold and copper                          https://www.killi.com.au/
7           Lukin Resources (TBA)      LKN       $0.20  $7.5m       Mineral exploration  https://smallcaps.com.au/lukin-resources-launc...
8         Many Peaks Gold (2 Mar)      MPG       $0.20  $5.5m       Mineral exploration                          https://manypeaks.com.au/
9         Norfolk Metals (14 Mar)      NFL       $0.20  $5.5m          Gold and uranium                      https://norfolkmetals.com.au/
10    Omnia Metals Group (21 Feb)      OM1       $0.20  $5.5m       Mineral exploration                    https://www.omniametals.com.au/
11        Pure Resources (16 Mar)      PR1       $0.20  $4.6m       Mineral exploration                   http://www.pureresources.com.au/
12     Pinnacle Minerals (11 Mar)      PIM       $0.20  $5.5m        Kaolin - Haloysite                   https://pinnacleminerals.com.au/
13          Stelar Metals (7 Mar)      SLB       $0.20    $7m           Copper and zinc                       https://stelarmetals.com.au/
14        Top End Energy (21 Mar)      TEE       $0.20  $6.4m               Oil and gas                    http://www.topendenergy.com.au/
15  US Student Housing REIT (TBA)      USQ       $1.38   $45m  US student accommodation                              https://usq-reit.com/

Process finished with exit code 0

The expected output should have three links/hrefs for some rows the 'Information' column, however it is only returning the first link/href for all of them. Could someone please guide me in the right direction?
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-02-16 01:31:57

下面的工作似乎是可行的-它将寻找所有的href项目在内容.a允许多个参考在可用的地方。

代码语言:javascript
复制
for content in all_ipo_content:
    if content.a:
    all_urls = [content.get("href") for content in content.find_all('a')]
    ipo_contents.append(all_urls)
票数 0
EN

Stack Overflow用户

发布于 2022-02-15 10:18:50

代码语言:javascript
复制
a = content.find('a', href=True);

如果有多个find_all,那么这也应该是一个,所以:

代码语言:javascript
复制
a = content.find_all('a', href=True);
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/71124561

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档