正如上面的标题所提到的,我正在尝试创建一个类似于article name:link的字典,我使用BS4深入到html中并获得我需要的东西(因为每次我使用一个范围来获得前5并循环时,它都是一个不同的类)
data = requests.get("https://www.marketingdive.com")
soup = BS(data.content, 'html5lib')
top_story = []
for i in range(6):
items = soup.find("a", {"class": f"analytics t-dash-top-{i}"})
#print(items.get('href'))
top_story.append(items)
print(top_story)最终结果如下:
[None, <a class="analytics t-dash-top-1" href="/news/youtube-shorts-revenue-sharing-creator-economy-TikTok/632272/">
YouTube brings revenue sharing to Shorts as battle for creator talent intensifies
</a>, <a class="analytics t-dash-top-2" href="/news/Walmart-TikTok-Snapchat-Gen-Z-retail-commerce-ads/632191/">
Walmart weds data to popular apps like TikTok in latest ad play
</a>, <a class="analytics t-dash-top-3" href="/news/retail-media-global-ad-spend-groupm/632269/">
Retail media makes up 11% of global ad spend, GroupM says
</a>, <a class="analytics t-dash-top-4" href="/news/mike-hard-lemonade-gen-z-pto/632267/">
Mike’s Hard Lemonade pays consumers to take PTO
</a>, <a class="analytics t-dash-top-5" href="/news/samsung-nbcuniversal-tonight-show-metaverse-fortnite/632194/">
Samsung, NBCUniversal bring Rockefeller Center to the metaverse
</a>]我尝试过拆分字符串,只尝试从信息中获得href (按照文档),并在这里使用其他解决方案,但我感到困惑,唯一能想到的就是我错过了一个步骤。任何关于我能在哪里解决这个问题的答复和评论都将不胜感激。
发布于 2022-09-22 11:44:34
from bs4 import BeautifulSoup
import requests
from pprint import pp
from urllib.parse import urljoin
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'lxml')
goal = {x.get_text(strip=True): urljoin(url, x['href'])
for x in soup.select('a[class^="analytics t-dash-top"]')}
pp(goal)
main('https://www.marketingdive.com/')https://stackoverflow.com/questions/73813913
复制相似问题