文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用美观汤读取父标头<h>和<p>数据

问如何使用美观汤读取父标头<h>和<p>数据
EN

Stack Overflow用户

提问于 2019-01-28 14:34:24

回答 1查看 26关注 0票数 0

我想在下面的示例中读取相应的头<h1>和段落<p>数据.

我有很多标题和段落是相互关联的，所以如果我找到了一个标题，那么我需要提取相应的段落数据：

<h1>Supplementary Materials </h1>\n
    <p />\n
    <p>The workshop entitled “Next generation MRA (Microbiological Risk Assessment); integration of Omics data into assessment” took place in Athens, Greece, May 13-14, 2016, and resulted in four papers that are published in this issue, namely, Cocolin et al., Rantsiou et al., Den Besten et al., and Haddad et al. </p>\n
<h1>Testing data</h1>
    <p>The supplementary materials, Table S1 and Table S2, are integrated parts of these four papers.</p>\n
    <p />

<h1>Supplementary Materials </h1>\n
    <p />\n
    <p>The workshop entitled “Next generation MRA (Microbiological Risk Assessment); integration of Omics data into assessment” took place in Athens, Greece, May 13-14, 2016, and resulted in four papers that are published in this issue, namely, Cocolin et al., Rantsiou et al., Den Besten et al., and Haddad et al. </p>\n
<h1>Testing data</h1>
    <p>The supplementary materials, Table S1 and Table S2, are integrated parts of these four papers.</p>\n
    <p />

python

html

beautifulsoup

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-01-28 15:21:12

html真的是这样重复的吗?还是这是一个错误？

html = '''<h1>Supplementary Materials </h1>\n
    <p />\n
    <p>The workshop entitled “Next generation MRA (Microbiological Risk Assessment); integration of Omics data into assessment” took place in Athens, Greece, May 13-14, 2016, and resulted in four papers that are published in this issue, namely, Cocolin et al., Rantsiou et al., Den Besten et al., and Haddad et al. </p>\n
<h1>Testing data</h1>
    <p>The supplementary materials, Table S1 and Table S2, are integrated parts of these four papers.</p>\n
    <p />

<h1>Supplementary Materials </h1>\n
    <p />\n
    <p>The workshop entitled “Next generation MRA (Microbiological Risk Assessment); integration of Omics data into assessment” took place in Athens, Greece, May 13-14, 2016, and resulted in four papers that are published in this issue, namely, Cocolin et al., Rantsiou et al., Den Besten et al., and Haddad et al. </p>\n
<h1>Testing data</h1>
    <p>The supplementary materials, Table S1 and Table S2, are integrated parts of these four papers.</p>\n
    <p /> '''

import bs4

soup = bs4.BeautifulSoup(html, 'html.parser')
heads = soup.find_all('h1')

for head in heads:
    para = head.find_next('p', text=True).text
    print ('Header: %s\nParagraph: %s\n' %(head.text, para))

输出：

Header: Supplementary Materials 
Paragraph: The workshop entitled “Next generation MRA (Microbiological Risk Assessment); integration of Omics data into assessment” took place in Athens, Greece, May 13-14, 2016, and resulted in four papers that are published in this issue, namely, Cocolin et al., Rantsiou et al., Den Besten et al., and Haddad et al. 

Header: Testing data
Paragraph: The supplementary materials, Table S1 and Table S2, are integrated parts of these four papers.

Header: Supplementary Materials 
Paragraph: The workshop entitled “Next generation MRA (Microbiological Risk Assessment); integration of Omics data into assessment” took place in Athens, Greece, May 13-14, 2016, and resulted in four papers that are published in this issue, namely, Cocolin et al., Rantsiou et al., Den Besten et al., and Haddad et al. 

Header: Testing data
Paragraph: The supplementary materials, Table S1 and Table S2, are integrated parts of these four papers.

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/54404197

复制

相似问题

问如何使用美观汤读取父标头<h>和<p>数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用美观汤读取父标头<h>和<p>数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用美观汤读取父标头<h>和<p>数据
EN