问需要将从BS4获得的链接列表转换为dict，但我从刮取的文件中获得了这些信息
EN

Stack Overflow用户

提问于 2022-09-22 11:35:03

回答 1查看 30关注 0票数 -1

正如上面的标题所提到的，我正在尝试创建一个类似于article name:link的字典，我使用BS4深入到html中并获得我需要的东西(因为每次我使用一个范围来获得前5并循环时，它都是一个不同的类)

data = requests.get("https://www.marketingdive.com")
soup = BS(data.content, 'html5lib')
top_story = []

for i in range(6):
    items = soup.find("a", {"class": f"analytics t-dash-top-{i}"})
    #print(items.get('href'))
    top_story.append(items)

print(top_story)

最终结果如下：

[None, <a class="analytics t-dash-top-1" href="/news/youtube-shorts-revenue-sharing-creator-economy-TikTok/632272/">
                                                    YouTube brings revenue sharing to Shorts as battle for creator talent intensifies
                                                </a>, <a class="analytics t-dash-top-2" href="/news/Walmart-TikTok-Snapchat-Gen-Z-retail-commerce-ads/632191/">
                                                    Walmart weds data to popular apps like TikTok in latest ad play
                                                </a>, <a class="analytics t-dash-top-3" href="/news/retail-media-global-ad-spend-groupm/632269/">
                                                    Retail media makes up 11% of global ad spend, GroupM says
                                                </a>, <a class="analytics t-dash-top-4" href="/news/mike-hard-lemonade-gen-z-pto/632267/">
                                                    Mike’s Hard Lemonade pays consumers to take PTO
                                                </a>, <a class="analytics t-dash-top-5" href="/news/samsung-nbcuniversal-tonight-show-metaverse-fortnite/632194/">
                                                    Samsung, NBCUniversal bring Rockefeller Center to the metaverse
                                                </a>]

我尝试过拆分字符串，只尝试从信息中获得href (按照文档)，并在这里使用其他解决方案，但我感到困惑，唯一能想到的就是我错过了一个步骤。任何关于我能在哪里解决这个问题的答复和评论都将不胜感激。

python-3.x

web-scraping

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2022-09-22 11:44:34

from bs4 import BeautifulSoup
import requests
from pprint import pp
from urllib.parse import urljoin


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'lxml')
    goal = {x.get_text(strip=True): urljoin(url, x['href'])
            for x in soup.select('a[class^="analytics t-dash-top"]')}
    pp(goal)


main('https://www.marketingdive.com/')

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/73813913

复制

相似问题

问需要将从BS4获得的链接列表转换为dict，但我从刮取的文件中获得了这些信息
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问需要将从BS4获得的链接列表转换为dict，但我从刮取的文件中获得了这些信息EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问需要将从BS4获得的链接列表转换为dict，但我从刮取的文件中获得了这些信息
EN