文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用BeautifulSoup4解析数据？

问如何使用BeautifulSoup4解析数据？
EN

Stack Overflow用户

提问于 2018-05-14 17:00:23

回答 1查看 60关注 0票数 0

以下是.xml文件中的示例：

    <title>Kaufsignal für Marriott International</title>
    <link>https://insideparadeplatz.ch/2015/03/06/kaufsignal-fuer-marriott-international/</link>
    <pubDate>Fri, 06 Mar 2015 </pubDate>
    <content:encoded>
        <![CDATA[
            <p class="p1">
                <span class="s1">Mit Marken wie Bulgari, Ritz-Carlton, Marriott und weiteren ist Marriott International nach sämtlichen Kriterien, die vom <a href="http://www.obermatt.com/de/home.html">
                <span class="s2">Obermatt-System</span></a></span> bewertet werden, ein interessantes Investment. Der Titel ist relativ gesehen günstig, das Unternehmen sollte weiter überproportional wachsen, und es ist solide finanziert, mit einem guten Verhältnis von Eigenkapital und Schulden. Über alle Kategorien gesehen landet die 
                <span class="s3">Marriott-Aktie</span></a>, die derzeit an der Technologiebörse Nasdaq bei rund 84 Dollar gehandelt wird, in der Wochenauswertung im Total-Ranking auf dem ersten Platz.

                <img class="aligncenter wp-image-17092 size-full" src="https://insideparadeplatz.ch/wp-content/uploads/2015/03/Total-Ranking-6-Mar-2015.png" alt="Total-Ranking 6 Mar 2015" width="873" height="627" /></a></p>]]>
    </content:encoded>

我想做的是，使用beautifulsoup4，我能够提取'title', 'link', 'pubDate'‘。但问题是“内容:编码”。在这里，我想从“img_list”的“内容:编码”中提取'img‘。我尝试过许多解决方案，但我得到的只是没有。

title = []
link = []
date = []
img_list = []
for item in soup.find_all('item'):
    for t in item.find_all('title'):
        title.append(t.text)
for item in soup.find_all('item'):
    for l in item.find_all('link'):
        link.append(t.text)
for item in soup.find_all('item'):
    for date in item.find_all('pubDate'):
        pubDate.append(date.text)
for item in soup.find_all('item'):
    for data in item.find_all('content:encoded'):
        data.text

我试过：

for item in soup.find_all('item'):
    for data in item.find_all('content:encoded'):
        for img in data.find_all('img'):
            img_list.append(img.text)

但一无所获。我在这里错过了什么？

python

xml

parsing

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-05-14 17:32:28

我想你要把img的数据拿出来会有困难。

for item in soup.find("content:encoded"):
   print(item)
   print(type(item))

然后见：https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigablestring

因此，bs4认为它是一个字符串，您需要手动解析它，或者可能将新字符串重新提供给一个新的bs4对象。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50335492

复制

相似问题

问如何使用BeautifulSoup4解析数据？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用BeautifulSoup4解析数据？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用BeautifulSoup4解析数据？
EN