文章/答案/技术大牛

发布

社区首页 >问答首页 >BeautifulSoup:在定义的h2标记之间拉p标记

问BeautifulSoup:在定义的h2标记之间拉p标记
EN

Stack Overflow用户

提问于 2017-07-20 02:37:48

回答 1查看 1.1K关注 0票数 2

这让我有点困惑了。我试图用“新基金”和“新基金”的名称从'h2‘标签下的'p’标签中提取所有的文本。“p”标签的数量对于每个页面都不一致，所以我在想一些while循环，而我尝试过的内容没有起作用。每个文件的格式

tag通常是公司名称中的“strong”，然后列出文本和其他“强”标签，以确定谁出资/投资。

一旦我能够正确地解析它，目标是将公司名称从“strong”标记导出，并带有过程文本和投资公司/人员(通过在“p”块中跟踪“强”标记来进行一些数据分析)。

任何帮助都会受到感谢--是的，我已经翻阅过其他各种帮助页面，但是我所做的尝试并没有成功，所以我来到这里。

import requests
page = requests.get("https://www.strictlyvc.com/2017/06/13/strictlyvc-june-12-2017/")
page
page.content
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, 'html.parser')
entrysoup = soup.find(class_ = 'post-entry')

//试图找到正确的段落，但这些段落只选择下一个段落，我需要所有的

标签下的‘新基金’和‘新基金’(基本上，直到下一个标签，不是这些。

print(entrysoup.find('h2', text = 'New Fundings').find_next_sibling('p'))
print(entrysoup.find('h2', text = 'New Funds').find_next_sibling('p'))

//这更接近了，但我不知道如何让它在点击非新基金/新基金标签时停止。

for strong_tag in entrysoup.find_all('strong'):
    print (strong_tag.text, strong_tag.next_sibling)

beautifulsoup

html-parsing

python-3.5

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-07-20 09:23:09

我认为这是我目前能得到的最好的结果。如果不是你想要的，让我知道这样我就能更多地小提琴了。如果是，将其标记为答案：)

    import requests
    import bs4

    page = requests.get("https://www.strictlyvc.com/2017/06/13/strictlyvc-june-12-2017/")
    soup =bs4.BeautifulSoup(page.content, 'html.parser')
    entrysoup = soup.find(class_ = 'post-entry')

    Stop_Point = 'Also Sponsored By . . .'

    for strong_tag in entrysoup.find_all('h2'):

        if strong_tag.get_text() == 'New Fundings':
            for sibling in strong_tag.next_siblings:
                if isinstance(sibling, bs4.element.Tag):
                    print(sibling.get_text())

                    if sibling.get_text() == Stop_Point:
                        break

                if sibling.name == 'div':
                    for children in sibling.children:
                        if isinstance(children, bs4.element.Tag):
                            if children.get_text() == Stop_Point:
                                break

                            print(children.get_text())

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/45204152

复制

相似问题

问BeautifulSoup:在定义的h2标记之间拉p标记
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup:在定义的h2标记之间拉p标记EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问BeautifulSoup:在定义的h2标记之间拉p标记
EN