我有一个文本文件,其内容遵循一组规则。下面是文件的一个片段:
<class 'NXOpen.Features.FeatureCollection'>
Type: <class 'NXOpen.Features.DatumCsys'> FeatureName: Datum Coordinate System(0)
Parents:
Children:
Name: , JournalIdentifier: SKETCH(1:1B)
Expressions:
Entities:
Name: , JournalIdentifier: HANDLE R-849
Name: , JournalIdentifier: HANDLE R-850
Name: , JournalIdentifier: DATUM_CSYS(0) YZ plane
Name: , JournalIdentifier: DATUM_CSYS(0) XZ plane
Name: , JournalIdentifier: DATUM_CSYS(0) XY plane
Name: , JournalIdentifier: DATUM_CSYS(0) X axis
Name: , JournalIdentifier: DATUM_CSYS(0) Y axis
Name: , JournalIdentifier: DATUM_CSYS(0) Z axis
Type: <class 'NXOpen.Features.DatumCsys'> FeatureName: Datum Coordinate System(1)inf
Parents:
Name: , JournalIdentifier: DATUM_CSYS(0)
Children:
Name: , JournalIdentifier: SKETCH(1)
Expressions:
Entities:
Name: , JournalIdentifier: HANDLE R-4283
Name: , JournalIdentifier: HANDLE R-4284
Name: , JournalIdentifier: SKETCH(1:1B) YZ plane
Name: , JournalIdentifier: SKETCH(1:1B) XZ plane
Name: , JournalIdentifier: SKETCH(1:1B) XY plane
Name: , JournalIdentifier: SKETCH(1:1B) X axis
Name: , JournalIdentifier: SKETCH(1:1B) Y axis
Name: , JournalIdentifier: SKETCH(1:1B) Z axis我想使用re提取两个类型:标记之间的所有文本,例如,我想提取以下内容:
Type: <class 'NXOpen.Features.DatumCsys'> FeatureName: Datum Coordinate System(0)
Parents:
Children:
Name: , JournalIdentifier: SKETCH(1:1B)
Expressions:
Entities:
Name: , JournalIdentifier: HANDLE R-849
Name: , JournalIdentifier: HANDLE R-850
Name: , JournalIdentifier: DATUM_CSYS(0) YZ plane
Name: , JournalIdentifier: DATUM_CSYS(0) XZ plane
Name: , JournalIdentifier: DATUM_CSYS(0) XY plane
Name: , JournalIdentifier: DATUM_CSYS(0) X axis
Name: , JournalIdentifier: DATUM_CSYS(0) Y axis
Name: , JournalIdentifier: DATUM_CSYS(0) Z axis还有这个
Type: <class 'NXOpen.Features.DatumCsys'> FeatureName: Datum Coordinate System(1)inf
Parents:
Name: , JournalIdentifier: DATUM_CSYS(0)
Children:
Name: , JournalIdentifier: SKETCH(1)
Expressions:
Entities:
Name: , JournalIdentifier: HANDLE R-4283
Name: , JournalIdentifier: HANDLE R-4284
Name: , JournalIdentifier: SKETCH(1:1B) YZ plane
Name: , JournalIdentifier: SKETCH(1:1B) XZ plane
Name: , JournalIdentifier: SKETCH(1:1B) XY plane
Name: , JournalIdentifier: SKETCH(1:1B) X axis
Name: , JournalIdentifier: SKETCH(1:1B) Y axis
Name: , JournalIdentifier: SKETCH(1:1B) Z axis使用正则表达式。我在python上试过这个
re.findall('Type: [\w\s]+)Type:', string)但这给了我一个空名单。要实现这一点,正确的re表达式应该是什么?
谢谢。
发布于 2021-10-31 10:15:29
在您的模式Type: [\w\s]+)Type:中有一个未关闭的)
模式中匹配Type: 2次的问题是,当您第二次在模式中匹配它时,它将阻止下一次匹配,因为它已经匹配了。
您可以使用模式来匹配Type:后面跟着不以它开头的所有行。
^Type: .*(?:\n(?!Type: ).*)*模式匹配:
^开始Type:匹配.*与行的其余部分匹配(?:非捕获群\n(?!Type: ).*匹配换行符并断言它不以Type:开头,然后匹配整行)*关闭非捕获组并选择repeta使用re.findall,行将是
re.findall(r"^Type: .*(?:\n(?!Type: ).*)*", string, re.M)发布于 2021-10-31 11:35:15
试一试
"(?ms)^Type:(?:(?!^Type:).)*"其中(?ms)打开re.MULTILINE和re.DOTALL模式,因此^匹配每一行的开头,.匹配任何字符,包括换行符。
https://stackoverflow.com/questions/69785817
复制相似问题