首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >BeautifulSoup soup.select切断子标记

BeautifulSoup soup.select切断子标记
EN

Stack Overflow用户
提问于 2019-03-16 22:47:45
回答 1查看 323关注 0票数 2

当运行一个脚本来检索带有类"FlatParagraph“的所有块引号标记时,我似乎切断了块引号标记中的一些子标记。是否有包含所有子标记的查询?问题似乎在于一组<blockquote><i><a>text<a/><i/>标记。所以并不是所有的孩子都有问题。

我使用以下代码

代码语言:javascript
复制
import urllib


from urllib.request import urlopen
from bs4 import BeautifulSoup

fhand = urllib.request.urlopen('https://www.legislation.qld.gov.au/view/whole/html/2018-07-01/sl-2006-0200').read()

soup = BeautifulSoup(fhand, 'html.parser')
fp = soup.select('blockquote[class="FlatParagraph"]')
for i in fp: 
    print(i.text)
    print('---------')

然后,我使用for循环从每一行检索文本。

代码语言:javascript
复制
changedfplist = list()
for i in fp:
    changedfplist.append(i.text.replace(u'\xa0', ' ').encode('utf-8'))

下面是我分析的一个例子-

代码语言:javascript
复制
<blockquote class="FlatParagraph"><blockquote class="Paragraph"><span class="ListNumber">(1)</span>This section applies if—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span>before the commencement—<blockquote class="Paragraph List"><span class="ListNumber">(i)</span>a person applied under <a href="#sec.28">section&nbsp;28</a>(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(ii)</span>an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(iii)</span>the service had not decided whether or not to approve the proposed fire engineering design brief; and</blockquote>
</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span>the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(2)</span>For assessing the fire engineering design brief for the stated building work—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span><a href="#sec.61">section&nbsp;61</a> applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span><a href="#sec.62">section&nbsp;62</a>(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(c)</span><a href="#sch.2">schedule&nbsp;2</a>, <a href="#sch.2-pt.3">part&nbsp;3</a>, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(3)</span>In this section—<blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> means a fire engineering brief meeting under <a href="#sec.28">section&nbsp;28</a>(2)(d) of the repealed regulation.</blockquote><blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringdesignbriefmeetingfee"></a>former fire engineering design brief meeting fee</i></b> means the fire engineering design brief meeting fee stated in <a href="#sch.3">schedule&nbsp;3</a> of the repealed regulation.</blockquote></blockquote></blockquote>

当我解析这个时,我得到

(1)如-(A)在生效日期前-(I)根据已废除规例第28(1)条申请批准拟就述明的建筑工程而拟备的消防工程设计概要,则本条适用;及 (Ii)一名获授权的服务代表曾出席一次与批准拟议的消防工程设计概要有关的消防工程简介会;及 (Iii)该服务尚未决定是否批准建议的消防工程设计概要;及 (B)该人没有就该服务的代表出席前一次消防工程简介会议缴付前消防工程设计简介会议费用。 (2)为评估所述建筑工程的消防工程设计概要-(A)第61条适用,犹如提述消防工程简介是提述建议的消防工程设计概要一样;及 (B)第62(1)(d)条适用,犹如每次消防工程简介会议的提述包括提述每一次前消防工程简介会;及 (C)附表2第3部第3项适用,犹如提述会议包括提述前消防工程简介会一样。 (3)在本条中-前消防工程简介会

最后一行的末尾缺少文字。它被切断了

代码语言:javascript
复制
<blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> 

UPDATE --有一个类我试图避免,所以使用.FlatParagraph没有起作用。我试图避免class=FlatParagraph视图-历史-笔记。FlatParagraph视图-历史笔记是FlatParagraph类标记的子标记的一个类。

我已经用lxml和html.parser尝试了上面的代码,我用lxml获得了所有的文本,用html.parser获取了切断的文本。如果有人知道原因,我很想听!

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-03-17 08:30:51

您可以使用select()find()查看下面的代码,我得到了完整的文本!

代码语言:javascript
复制
html = '''
<blockquote class="FlatParagraph"><blockquote class="Paragraph"><span class="ListNumber">(1)</span>This section applies if—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span>before the commencement—<blockquote class="Paragraph List"><span class="ListNumber">(i)</span>a person applied under <a href="#sec.28">section&nbsp;28</a>(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(ii)</span>an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(iii)</span>the service had not decided whether or not to approve the proposed fire engineering design brief; and</blockquote>
</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span>the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(2)</span>For assessing the fire engineering design brief for the stated building work—<blockquote class="Paragraph List"><span class="ListNumber">(a)</span><a href="#sec.61">section&nbsp;61</a> applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(b)</span><a href="#sec.62">section&nbsp;62</a>(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and</blockquote>
<blockquote class="Paragraph List"><span class="ListNumber">(c)</span><a href="#sch.2">schedule&nbsp;2</a>, <a href="#sch.2-pt.3">part&nbsp;3</a>, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.</blockquote>
</blockquote><blockquote class="Paragraph"><span class="ListNumber">(3)</span>In this section—<blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringbriefmeeting"></a>former fire engineering brief meeting</i></b> means a fire engineering brief meeting under <a href="#sec.28">section&nbsp;28</a>(2)(d) of the repealed regulation.</blockquote><blockquote class="Paragraph-No-Number"><b><i><a name="sec.90-ssec.3-def.formerfireengineeringdesignbriefmeetingfee"></a>former fire engineering design brief meeting fee</i></b> means the fire engineering design brief meeting fee stated in <a href="#sch.3">schedule&nbsp;3</a> of the repealed regulation.</blockquote></blockquote></blockquote>
'''
soup = BeautifulSoup(html,'lxml')
fp = soup.select('.FlatParagraph')
for i in fp:
    print(i.text)

代码语言:javascript
复制
fp = soup.find('blockquote',attrs={'class':'FlatParagraph'})
print(fp.text)

输出:

代码语言:javascript
复制
(1)This section applies if—(a)before the commencement—(i)a person applied under section 28(1) of the repealed regulation for approval of a proposed fire engineering design brief for stated building work; and
(ii)an authorised representative of the service attended a former fire engineering brief meeting relating to the approval of the proposed fire engineering design brief; and
(iii)the service had not decided whether or not to approve the proposed fire engineering design brief; and

(b)the person has not paid the former fire engineering design brief meeting fee for the attendance of the representative of the service at the former fire engineering brief meeting.
(2)For assessing the fire engineering design brief for the stated building work—(a)section 61 applies as if the reference to a fire engineering brief were a reference to the proposed fire engineering design brief; and
(b)section 62(1)(d) applies as if the reference to each fire engineering brief meeting included a reference to each former fire engineering brief meeting; and
(c)schedule 2, part 3, item 3 applies as if a reference to a meeting included a reference to a former fire engineering brief meeting.
(3)In this section—former fire engineering brief meeting means a fire engineering brief meeting under section 28(2)(d) of the repealed regulation.former fire engineering design brief meeting fee means the fire engineering design brief meeting fee stated in schedule 3 of the repealed regulation.
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/55202258

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档