我想刮个人的网站和博客的链接在https://lawyers.justia.com/lawyer/robin-d-gross-39828上。
我到目前为止:
if soup.find('div', attrs={'class': "heading-3 block-title iconed-heading font-w-bold"}) is not None:
webs = soup.find('div', attrs={'class': "heading-3 block-title iconed-heading font-w-bold"})
print(webs.findAll("href"))发布于 2020-03-06 07:37:04
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0'}
r = requests.get(
"https://lawyers.justia.com/lawyer/robin-d-gross-39828", headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
for item in soup.findAll("a", {'data-vars-action': ['ProfileWebsite', 'ProfileBlogPost']}):
print(item.get("href"))输出:
http://www.imaginelaw.com/
http://www.imaginelaw.com/lawyer-attorney-1181486.html
http://www.ipjustice.org/internet-governance/icann-accountability-deficits-revealed-in-panel-ruling-on-africa/
http://www.circleid.com/members/5382
http://www.circleid.com/posts/20160301_icann_accountability_proposal_power_of_governments_over_internet
http://www.circleid.com/posts/20151201_supporting_orgs_marginalized_in_icann_accountability_proposal
http://www.circleid.com/posts/20150720_icann_accountability_deficits_revealed_in_panel_ruling_on_africa
http://www.circleid.com/posts/20150401_freedom_of_expression_chilled_by_icann_addition_of_speech
http://www.circleid.com/posts/20150203_proposal_for_creation_of_community_veto_for_key_icann_decisions
http://www.circleid.com/posts/20150106_civil_society_cautions_icann_giving_governments_veto_geo_domains
http://www.circleid.com/posts/20140829_radical_shift_of_power_proposed_at_icann_govts_in_primary_role
http://www.circleid.com/posts/20140821_icanns_accountability_plan_gives_icann_board_total_control
http://www.circleid.com/posts/20140427_a_civil_society_perspective_on_netmundial_final_outcome_document
https://imaginelaw.wordpress.comhttps://stackoverflow.com/questions/60559133
复制相似问题