文章/答案/技术大牛

发布

问多行Regex
EN

Stack Overflow用户

提问于 2016-03-13 17:32:56

回答 3查看 71关注 0票数 1

因此，我试图处理以下文本。我想要的是得到数据的匹配，从每堂课的学分开始，以赛季和年份结束。所以对于第一堂课，应该是这样的：

3 credits in Philosophical Perspectives
PHIL 101L
PHILOSOPHICAL PERSPECTIVES
B
3
Fall 2014

此外，我也需要得到他们仍然需要的课程。如果你注意到他们在历史上少了3个学分。这是我的短信：

3 credits in Philosophical Perspectives
PHIL 101L
PHILOSOPHICAL PERSPECTIVES
B
3
Fall 2014
Student View
3 credits in Fine Arts
ART 160L
HIST WEST ART I
B+
3
Fall 2014
3 credits in History
Still Needed:
Click here to see classes that satisfy this requirement.
3 credits in Literature
ENG 201L
INTRO LINGUISTIC
IP
(3)
Spring 2016
3 credits in Math
Still Needed:
Click here to see classes that satisfy this requirement.
3 credits in Natural Science
BIOL 225L
TOPICS IN NUTRITION
A-
3
Spring 2015
3 credits Ethics/Applied Ethics/Religious Studies
REST 209L
WORLD RELIGIONS
A-
3
Spring 2015
3 credits in Social Science
ECON 104L
PRINC MACROECONOM
T
3
Fall 2014

python

regex

python-3.x

回答 3

Stack Overflow用户

回答已采纳

发布于 2016-03-13 17:37:38

(?:^|(?<=\n))\d+\s+credits[]\s\S]*?(?=\n\d+\s+credits|$)

您可以将其与findall.See演示一起使用。

https://regex101.com/r/gK9aI6/1

import re
p = re.compile(r'(?:^|(?<=\n))\d+\s+credits[]\s\S]*?(?=\n\d+\s+credits|$)')
test_str = "3 credits in Philosophical Perspectives\nPHIL 101L\nPHILOSOPHICAL PERSPECTIVES\nB\n3\nFall 2014\nStudent View\n3 credits in Fine Arts\nART 160L\nHIST WEST ART I\nB+\n3\nFall 2014\n3 credits in History\nStill Needed:\nClick here to see classes that satisfy this requirement.\n3 credits in Literature\nENG 201L\nINTRO LINGUISTIC\nIP\n(3)\nSpring 2016\n3 credits in Math\nStill Needed:\nClick here to see classes that satisfy this requirement.\n3 credits in Natural Science\nBIOL 225L\nTOPICS IN NUTRITION\nA-\n3\nSpring 2015\n3 credits Ethics/Applied Ethics/Religious Studies\nREST 209L\nWORLD RELIGIONS\nA-\n3\nSpring 2015\n3 credits in Social Science\nECON 104L\nPRINC MACROECONOM\nT\n3\nFall 2014"

re.findall(p, test_str)

票数 0

Stack Overflow用户

发布于 2016-03-13 17:58:54

您可以组合一个非贪婪的“任何”序列，并使用每组最后一行的已知结构将它们解析为块：

/((?:.\n?)*?(?:Fall|Summer|Spring|Winter)\s\d{4})/g

(?:.\n?)*? -一次吃任何字符(可能后面有换行符)
然后简单地匹配结束序列：(?:Fall|Summer|Spring|Winter)\s\d{4}

请看这里的演示和注意，每个信用实际上是单个正则表达式匹配。

票数 0

Stack Overflow用户

发布于 2016-03-13 18:06:16

试试下面的片段：

import re

courses = r"....your...content"

rx = re.compile(r"\d+.*?(?:FALL|SPRING)\s*\d{4}", re.IGNORECASE | re.DOTALL)
for course in rx.finditer(courses):
    print(course.group())
    print("----------------------------\n")

如果courses包含示例内容，则输出如下：

3 credits in Philosophical Perspectives
PHIL 101L
PHILOSOPHICAL PERSPECTIVES
B
3
Fall 2014
----------------------------

3 credits in Fine Arts
ART 160L
HIST WEST ART I
B+
3
Fall 2014
----------------------------

3 credits in History
Still Needed:
Click here to see classes that satisfy this requirement.
3 credits in Literature
ENG 201L
INTRO LINGUISTIC
IP
(3)
Spring 2016
----------------------------

... omitting rest....

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/35973408

复制

相似问题

问多行Regex
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问多行RegexEN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问多行Regex
EN