首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用BeautifulSoup和滚动从网页中获取链接以获得更多信息

使用BeautifulSoup和滚动从网页中获取链接以获得更多信息
EN

Stack Overflow用户
提问于 2020-07-30 13:01:43
回答 1查看 43关注 0票数 1

我试图从https://finance.yahoo.com/topic/stock-market-news获得文章的链接--我使用python3运行以下代码

代码语言:javascript
复制
url = "https://finance.yahoo.com/topic/stock-market-news"
    r1 = requests.get(url)
    page = r1.content
    soup = BeautifulSoup(page, 'html5lib')
    #print(soup.prettify())
    href = soup.find_all('a')
    boxes = []
    links = []
    for ref in href:
        curr = ref.parent.find('u')
        if curr is not None:
            boxes.append(ref)
            links.append(ref['href'])
    print(boxes)
    print(links)

但是,虽然我确实找到了链接,但其中一些看起来很奇怪

代码语言:javascript
复制
/news/stock-market-news-live-july-30-2020-221505732.html
/m/f39537a4-425d-3378-9ef7-e7188a513ca6/stock-index-futures-lower.html
/m/6c87eec2-e5a1-3bc3-916e-4f74b3c508bf/global-stocks-slump-as-u-s-.html
https://finance.yahoo.com/news/q2-gdp-us-economy-coronavirus-pandemic-consumer-171558880.html
https://finance.yahoo.com/video/influencers-andy-serwer-bill-gates-110000273.html
https://finance.yahoo.com/news/jobless-claims-week-ending-july-25-123150219.html

为什么会发生这种情况,我现在如何访问这些链接?

另一个子问题,这个网站的链接比我发现的要多很多,我认为它与网站加载更多有关,当你向下滚动时,我怎么能绕过它,让我可以加载一定数量的文章,例如10多篇文章?

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-07-30 13:10:29

添加这一行links.append(link if link.startswith("https://finance.yahoo.com") else f"https://finance.yahoo.com{link}" )

代码语言:javascript
复制
from bs4 import BeautifulSoup
import requests
from requests import get

url = "https://finance.yahoo.com/topic/stock-market-news"
r1 = requests.get(url)
page = r1.content
soup = BeautifulSoup(page, 'html5lib')
#print(soup.prettify())
href = soup.find_all('a')
boxes = []
links = []
for ref in href:
    curr = ref.parent.find('u')
    if curr is not None:
        boxes.append(ref)
        link = ref['href']
        links.append(link if link.startswith("https://finance.yahoo.com") else f"https://finance.yahoo.com{link}" )
print(boxes)
print("___"*10)
print(links)

输出:

代码语言:javascript
复制
[<a class="Fw(b) Fz(18px) Lh(23px) LineClamp(2,46px) Fz(17px)--sm1024 Lh(19px)--sm1024 LineClamp(2,38px)--sm1024 mega-item-header-link Td(n) C(#0078ff):h C(#000) LineClamp(2,46px) LineClamp(2,38px)--sm1024 not-isInStreamVideoEnabled" data-reactid="11" href="/m/d79af817-5b40-3545-a085-322c5d27628e/dow-futures-slump-as-q2-gdp.html" target="_self"><u class="StretchedBox" data-reactid="12"></u><!-- react-text: 13 -->Dow Futures Slump As Q2 GDP Plunges Most On Record, Weekly Jobless Claims Rise; Trump Raises Election Delay Prospect<!-- /react-text --></a>, <a class="Fw(b) Fz(18px) Lh(23px) LineClamp(2,46px) Fz(17px)--sm1024 Lh(19px)--sm1024 LineClamp(2,38px)--sm1024 mega-item-header-link Td(n) C(#0078ff):h C(#000) LineClamp(2,46px) LineClamp(2,38px)--sm1024 not-isInStreamVideoEnabled" data-reactid="28" href="/m/8f0877fd-0c34-306c-964d-2c9dd2aebd3c/ups-stock-is-jumping-after.html" target="_self"><u class="StretchedBox" data-reactid="29"></u><!-- react-text: 30 -->UPS Stock Is Jumping After the Company Delivered Smashing Earnings<!-- /react-text --></a>, <a class="Fw(b) Fz(18px) Lh(23px) LineClamp(2,46px) Fz(17px)--sm1024 Lh(19px)--sm1024 LineClamp(2,38px)--sm1024 mega-item-header-link Td(n) C(#0078ff):h C(#000) LineClamp(2,46px) LineClamp(2,38px)--sm1024 not-isInStreamVideoEnabled" data-reactid="48" href="/news/futures-sink-data-shows-historic-125417167.html"><u class="StretchedBox" data-reactid="49"></u><!-- react-text: 50 -->Futures sink as data shows historic slump<!-- /react-text --></a>, <a class="Fz(13px) LineClamp(4,96px) C(#0078ff):h Td(n) C($c-fuji-blue-4-b) smartphone_C(#000) smartphone_Fz(19px)" data-reactid="11" href="https://finance.yahoo.com/news/q2-gdp-us-economy-coronavirus-pandemic-consumer-171558880.html"><span class="Fw(600) smartphone_Fw(500)" data-reactid="12">Q2 GDP: US economy contracted by worst-ever 32.9% in Q2, crushed by coronavirus lockdowns</span><u class="StretchedBox Z(1)" data-reactid="13"></u></a>, <a class="Fz(13px) LineClamp(4,96px) C(#0078ff):h Td(n) C($c-fuji-blue-4-b) smartphone_C(#000) smartphone_Fz(19px)" data-reactid="26" href="https://finance.yahoo.com/video/influencers-andy-serwer-bill-gates-110000273.html"><span class="Fw(600) smartphone_Fw(500)" data-reactid="27">Influencers with Andy Serwer: Bill Gates</span><u class="StretchedBox Z(1)" data-reactid="28"></u></a>, <a class="Fz(13px) LineClamp(4,96px) C(#0078ff):h Td(n) C($c-fuji-blue-4-b) smartphone_C(#000) smartphone_Fz(19px)" data-reactid="38" href="https://finance.yahoo.com/news/jobless-claims-week-ending-july-25-123150219.html"><span class="Fw(600) smartphone_Fw(500)" data-reactid="39">Jobless claims top 1M again in latest week as coronavirus keeps battering workers</span><u class="StretchedBox Z(1)" data-reactid="40"></u></a>]
______________________________
['https://finance.yahoo.com/m/d79af817-5b40-3545-a085-322c5d27628e/dow-futures-slump-as-q2-gdp.html', 'https://finance.yahoo.com/m/8f0877fd-0c34-306c-964d-2c9dd2aebd3c/ups-stock-is-jumping-after.html', 'https://finance.yahoo.com/news/futures-sink-data-shows-historic-125417167.html', 'https://finance.yahoo.com/news/q2-gdp-us-economy-coronavirus-pandemic-consumer-171558880.html', 'https://finance.yahoo.com/video/influencers-andy-serwer-bill-gates-110000273.html', 'https://finance.yahoo.com/news/jobless-claims-week-ending-july-25-123150219.html']
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/63173768

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档