首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用BeautifulSoup + Python抓取href内部的超链接

用BeautifulSoup + Python抓取href内部的超链接
EN

Stack Overflow用户
提问于 2020-03-06 07:08:13
回答 1查看 37关注 0票数 0

我想刮个人的网站和博客的链接在https://lawyers.justia.com/lawyer/robin-d-gross-39828上。

我到目前为止:

代码语言:javascript
复制
if soup.find('div', attrs={'class': "heading-3 block-title iconed-heading font-w-bold"}) is not None:
    webs = soup.find('div', attrs={'class': "heading-3 block-title iconed-heading font-w-bold"}) 
    print(webs.findAll("href"))
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-03-06 07:37:04

代码语言:javascript
复制
from bs4 import BeautifulSoup
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0'}

r = requests.get(
    "https://lawyers.justia.com/lawyer/robin-d-gross-39828", headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')

for item in soup.findAll("a", {'data-vars-action': ['ProfileWebsite', 'ProfileBlogPost']}):
    print(item.get("href"))

输出:

代码语言:javascript
复制
http://www.imaginelaw.com/
http://www.imaginelaw.com/lawyer-attorney-1181486.html
http://www.ipjustice.org/internet-governance/icann-accountability-deficits-revealed-in-panel-ruling-on-africa/
http://www.circleid.com/members/5382
http://www.circleid.com/posts/20160301_icann_accountability_proposal_power_of_governments_over_internet
http://www.circleid.com/posts/20151201_supporting_orgs_marginalized_in_icann_accountability_proposal
http://www.circleid.com/posts/20150720_icann_accountability_deficits_revealed_in_panel_ruling_on_africa
http://www.circleid.com/posts/20150401_freedom_of_expression_chilled_by_icann_addition_of_speech
http://www.circleid.com/posts/20150203_proposal_for_creation_of_community_veto_for_key_icann_decisions
http://www.circleid.com/posts/20150106_civil_society_cautions_icann_giving_governments_veto_geo_domains
http://www.circleid.com/posts/20140829_radical_shift_of_power_proposed_at_icann_govts_in_primary_role
http://www.circleid.com/posts/20140821_icanns_accountability_plan_gives_icann_board_total_control
http://www.circleid.com/posts/20140427_a_civil_society_perspective_on_netmundial_final_outcome_document
https://imaginelaw.wordpress.com
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/60559133

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档