首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >通过美汤从NPR.org获取标题和标题链接

通过美汤从NPR.org获取标题和标题链接
EN

Stack Overflow用户
提问于 2020-06-18 05:46:23
回答 2查看 44关注 0票数 1

我很难做到这一点,并且找不到一种好的方法将它们全部选中。我需要得到实时更新以及正常的头条新闻。基本上网站上所有的标题都是粗体的。此外,我需要获得嵌入式链接,当这些被点击时激活。我有一些HTML的基础知识,以前也做过一些网络抓取的事情,但由于某种原因,我正在努力解决这个问题。有人能给我讲讲吗?

经过进一步的检查,我似乎想要找到所有文章类的子类?

EN

回答 2

Stack Overflow用户

发布于 2020-06-18 17:03:10

要获得正确的页面,需要定义正确的cookies=。然后我们选择所有<a> <h3 class="title">来获取链接和标题。

例如:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup


url = 'https://www.npr.org/?refresh=true'

cookies = {'choiceVersion': "1", 'dateOfChoice': "1584369909889", 'trackingChoice': "true"}
soup = BeautifulSoup(requests.get(url, cookies=cookies).content, 'html.parser')

for a in soup.select('a[href]:has(h3.title)'):
    print('{:<90}{}'.format(a.h3.text, a['href']))

打印:

代码语言:javascript
复制
Trump Told China To 'Go Ahead' With Prison Camps, Bolton Alleges In New Book              https://www.npr.org/2020/06/17/875876905/trump-told-china-to-go-ahead-with-concentration-camps-bolton-alleges-in-new-book
John Bolton Unloads On Former Boss Trump, Even If It's A Little Late Now                  https://www.npr.org/2020/06/17/879609378/john-bolton-unloads-on-former-boss-trump-even-if-its-a-little-late-now
Pompeo And China's Top Diplomat Meet In Hawaii As Relations Worsen                        https://www.npr.org/2020/06/18/879854568/pompeo-and-chinas-top-diplomat-meet-in-hawaii-as-relations-worsen
Former Atlanta Police Officer Who Shot Rayshard Brooks Charged With Felony Murder         https://www.npr.org/sections/live-updates-protests-for-racial-justice/2020/06/17/879509659/former-atlanta-police-officer-who-shot-rayshard-brooks-charged-with-felony-murde
Oakland Mayor Launches Hate Crime Investigation Into Nooses Found At Park                 https://www.npr.org/sections/live-updates-protests-for-racial-justice/2020/06/17/879758336/oakland-mayor-launches-hate-crime-investigation-into-nooses-found-at-park
Arbery Family Lawyer On Trump Meeting: 'He Doesn't Feel Like There's Systemic Racism'     https://www.npr.org/sections/live-updates-protests-for-racial-justice/2020/06/17/879682712/civil-rights-attorney-comments-on-his-meeting-with-president-trump
Tim Scott Says Dick Durbin's 'Token' Comment 'Hurts My Soul'                              https://www.npr.org/2020/06/17/879717148/tim-scott-says-dick-durbins-token-comment-hurts-my-soul
'From Here to Equality' Author Makes A Case, And A Plan, For Reparations                  https://www.npr.org/2020/06/17/879041052/william-darity-jr-discusses-reparations-racial-equality-in-his-new-book
'Hampton' No More: Man Sheds Family Name With Ties To Confederate General                 https://www.npr.org/sections/live-updates-protests-for-racial-justice/2020/06/17/879662628/hampton-no-more-man-sheds-family-name-with-ties-to-confederate-general
'Interrupt The Systems': Robin DiAngelo On 'White Fragility' And Anti-Racism              https://www.npr.org/2020/06/17/879136931/interrupt-the-systems-robin-diangelo-on-white-fragility-and-anti-racism
2020 Electoral Map Ratings: Biden Has An Edge Over Trump, With 5 Months To Go             https://www.npr.org/2020/06/17/877951588/2020-electoral-map-ratings-biden-has-an-edge-over-trump-with-5-months-to-go
Scientists Find The Biggest Soft-Shelled Egg Ever, Nicknamed 'The Thing'                  https://www.npr.org/2020/06/17/877679868/scientists-find-the-biggest-soft-shelled-egg-ever-nicknamed-the-thing
Justice Department Proposes Rolling Back Legal Protections For Online Platforms           https://www.npr.org/2020/06/17/879150136/doj-proposes-rolling-back-legal-protections-for-online-platforms
The Cameras Are Rolling On The Bold And The Beautiful                                     https://www.npr.org/sections/coronavirus-live-updates/2020/06/17/879773843/the-cameras-are-rolling-on-the-bold-and-the-beautiful
Why Now, White People?                                                                    https://www.npr.org/2020/06/16/878963732/why-now-white-people
Aunt Jemima Will Change Name, Image As Brands Confront Racial Stereotypes                 https://www.npr.org/sections/live-updates-protests-for-racial-justice/2020/06/17/879104818/acknowledging-racial-stereotype-aunt-jemima-will-change-brand-name-and-image
Northeast: Coronavirus-Related Restrictions By State                                      https://www.npr.org/2020/05/01/847331283/northeast-coronavirus-related-restrictions-by-state
South: Coronavirus-Related Restrictions By State                                          https://www.npr.org/2020/05/01/847415273/south-coronavirus-related-restrictions-by-state
West: Coronavirus-Related Restrictions By State                                           https://www.npr.org/2020/05/01/847416108/west-coronavirus-related-restrictions-by-state
Midwest: Coronavirus-Related Restrictions By State                                        https://www.npr.org/2020/06/11/847413697/midwest-coronavirus-related-restrictions-by-state
Amid Confusion About Reopening, An Expert Explains How To Assess COVID-19 Risk            https://www.npr.org/2020/06/17/879255417/amid-confusion-about-reopening-an-expert-explains-how-to-assess-covid-risk
TDC video carousel                                                                        https://www.npr.org/series/589466438/planet-money-shorts
5 Years After Charleston Church Massacre, What Have We Learned?                           https://www.npr.org/2020/06/17/878828088/5-years-after-charleston-church-massacre-what-have-we-learned
Ancient Bones Offer Clues To How Long Ago Humans Cared For The Vulnerable                 https://www.npr.org/sections/goatsandsoda/2020/06/17/878896381/ancient-bones-offer-clues-to-how-long-ago-humans-cared-for-the-vulnerable
Rita Indiana: La Monstra Returns With 'Black Sabbath Dembow'                              https://www.npr.org/2020/06/17/879316231/rita-indiana-la-monstra-returns-with-black-sabbath-dembow
Tracking The Pandemic: Are Coronavirus Cases Rising Or Falling In Your State?             https://www.npr.org/sections/health-shots/2020/03/16/816707182/map-tracking-the-spread-of-the-coronavirus-in-the-u-s
Which States Are Reopening? A State-By-State Guide                                        https://www.npr.org/2020/03/12/815200313/what-governors-are-doing-to-tackle-spreading-coronavirus
票数 0
EN

Stack Overflow用户

发布于 2020-06-18 06:01:59

您可以使用requests模块下载页面的HTML代码,然后通过h3标记对其进行解析,我注意到这些标记用于标题。

然后,您可以使用.find(string)方法来查找这样的HTML标记,当您这样做时,将从该HTML代码的索引中查找下一个</h3>实例。

我不能理解您想要解析哪些标题和多少个标题,但是您可以使用while循环对页面上的每个h3标签进行解析,直到找不到新的标签为止(如果.find()方法找不到字符串,它应该返回-1 )。

票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62438887

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档