首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Regex -查找字符串中的所有完整句子。

Regex -查找字符串中的所有完整句子。
EN

Stack Overflow用户
提问于 2021-01-18 06:21:34
回答 2查看 389关注 0票数 0

我看过这个线程:Regex to find all sentences of text?,但似乎无法用它来解决我的确切情况。下面是我正在研究的文本:

代码语言:javascript
复制
import regex as re

sentence=re.compile("[A-Z].*?[\.!?] ", re.MULTILINE | re.DOTALL )

phrase = """For necessary expenses of the Office of Inspector 
General, including employment pursuant to the Inspector 
General Act of 1978 (Public Law 95–452; 5 U.S.C. App.), 
$99,912,000, including such sums as may be necessary for 
contracting and other arrangements with public agencies 
and private persons pursuant to section 6(a)(9) of the Inspector General Act of 1978 (Public Law 95–452; 5 
U.S.C. App.), and including not to exceed $125,000 for 
certain confidential operational expenses, including the 
payment of informants, to be expended under the direction 
of the Inspector General pursuant to the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. App.) and 
section 1337 of the Agriculture and Food Act of 1981. For necessary expenses of the Office of the General 
23 Counsel, $45,390,000."""

phrase = phrase.replace("\n", "")

sentence.findall(phrase)

# outputs:
['For necessary expenses of the Office of Inspector General, including employment pursuant to the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. ',
 'App.), $99,912,000, including such sums as may be necessary for contracting and other arrangements with public agencies and private persons pursuant to section 6(a)(9) of the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. ',
 'App.), and including not to exceed $125,000 for certain confidential operational expenses, including the payment of informants, to be expended under the direction of the Inspector General pursuant to the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. ',
 'App.) and section 1337 of the Agriculture and Food Act of 1981. ']

在这种情况下,在这个长短语中只有两个实际的句子。第一个问题是:

支付监察主任办公室的必要费用,包括根据1978年“监察主任法”(公法95-452;5 U.S.C. App.)雇用的费用,99,912,000美元,包括根据1978年“监察主任法”第6(a)(9)节与公共机构和私人签订合同和其他安排所需的款项(公法95-452);5 U.S.C. App),其中包括不超过125 000美元的某些机密业务费用,包括根据1978年“监察主任法”(公法95-452;5 U.S.C. App),在监察主任的指导下支付举报人的费用。以及1981年“农业和粮食法”第1337条。

第二个问题是:

支付总法律顾问办公室的必要费用,45 390 000美元。

有没有办法,通过regex或其他方法,提取我想要的?最终的目标是能够提取所有完整的句子,然后搜索它们来寻找特定的东西。(如果这对解决方案有影响)

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-01-18 06:38:45

尝尝这个

代码语言:javascript
复制
regex = "(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s"
re.split(regex, phrase)
票数 1
EN

Stack Overflow用户

发布于 2021-01-18 06:41:48

代码语言:javascript
复制
import re
print ([x for x in re.split(r"([A-Z].+(\(.+\)){0,1}.+)\.\s",s.replace("\n"," ")) if x])

输出:

代码语言:javascript
复制
['For necessary expenses of the Office of Inspector  General, including employment pursuant to the Inspector  General Act of 1978 (Public Law 95–452; 5 U.S.C. App.),  $99,912,000, including such sums as may be necessary for  contracting and other arrangements with public agencies  and private persons pursuant to section 6(a)(9) of the Inspector General Act of 1978 (Public Law 95–452; 5  U.S.C. App.), and including not to exceed $125,000 for  certain confidential operational expenses, including the  payment of informants, to be expended under the direction  of the Inspector General pursuant to the Inspector General Act of 1978 (Public Law 95–452; 5 U.S.C. App.) and  section 1337 of the Agriculture and Food Act of 1981', 'For necessary expenses of the Office of the General  23 Counsel, $45,390,000.']

准则是:

代码语言:javascript
复制
regex = r"([A-Z].+(\(.+\)){0,1}.+)\.\s"

re.split(r"([A-Z].+(\(.+\)){0,1}.+)\.\s",s.replace("\n"," "))
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65769689

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档