首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >长RegEx模式没有按计划工作

长RegEx模式没有按计划工作
EN

Stack Overflow用户
提问于 2021-05-14 15:58:02
回答 1查看 39关注 0票数 1

我的regex模式似乎不适用于Python。这一列是从电子表格中分离出来的逗号,在逗号之间也有分隔事物的管道(|)。不过,我并不担心管道。我需要使用re.split()方法将字符串拆分为逗号,但是,您将在示例中注意到,用户在第一个|之前将逗号输入到第一个项目中的字符串中--因此我使用Regex建立要查找的模式。然而,它的工作不正常,可以使用另一套眼睛作为初学者。我已经通过Regex101构建并运行了Regex来帮助我,解释似乎是正确的,但它仍然没有返回我期望的匹配数。

我的正则表达式

".+\s\|\s\d\d\s\|\s\d\d\d\d\s\|\s\d\d\d\d\s\|\s.{2}\d\d\d\d\s\|\s\d+?\.\d+?,"gm

我的示例测试字符串

ICS: Basic Maintenance | 30 | 5877 | 0000 | IT0000 | 12000.0,ICS: E-Rate discount (85%) | 30 | 5877 | 0000 | IT0000 | -10200.0,ICS: Basic Maintenance | 40 | 5877 | 0000 | IT0000 | 9000.0,ICMS: E-Rate discount (85%) | 40 | 5877 | 0000 | IT0000 | -7650.0,ICS: Basic Maintenance | 20 | 5877 | 0000 | IT0000 | 13500.0,ICS: E-Rate discount (85%) | 20 | 5877 | 0000 | IT0000 | -11475.0,ICCMS: Basic Maintenance | 70 | 5877 | 0000 | IT0000 | 12000.0,ICCMS: E-Rate discount (85%) | 70 | 5877 | 0000 | IT0000 | -10200.0,ITSM: Laptops, Desktops, Computers | 30 | 4400 | IT0000 | 720400.0

匹配数:9匹配

匹配数:1 Match - (0-443):从Regex101导出的匹配

代码语言:javascript
复制
"
.\s\|\s\d\d\s\|\s\d\d\d\d\s\|\s\d\d\d\d\s\|\s.\d\d\d\d\s\|\s\d\.\d,
"
gm
. matches any character (except for line terminators)
+ matches the previous token between one and unlimited times, as many times as possible, giving back as needed (greedy)
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
\| matches the character | literally (case sensitive)
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
\d matches a digit (equivalent to [0-9])
\d matches a digit (equivalent to [0-9])
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
\| matches the character | literally (case sensitive)
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
\d matches a digit (equivalent to [0-9])
\d matches a digit (equivalent to [0-9])
\d matches a digit (equivalent to [0-9])
\d matches a digit (equivalent to [0-9])
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
\| matches the character | literally (case sensitive)
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
\d matches a digit (equivalent to [0-9])
\d matches a digit (equivalent to [0-9])
\d matches a digit (equivalent to [0-9])
\d matches a digit (equivalent to [0-9])
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
\| matches the character | literally (case sensitive)
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
. matches any character (except for line terminators)
\d matches a digit (equivalent to [0-9])
\d matches a digit (equivalent to [0-9])
\d matches a digit (equivalent to [0-9])
\d matches a digit (equivalent to [0-9])
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
\| matches the character | literally (case sensitive)
\s matches any whitespace character (equivalent to [\r\n\t\f\v  ])
\d matches a digit (equivalent to [0-9])
+? matches the previous token between one and unlimited times, as few times as possible, expanding as needed (lazy)
\. matches the character . literally (case sensitive)
\d matches a digit (equivalent to [0-9])
, matches the character , literally (case sensitive)
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
0-443   ICS: Basic Maintenance | 30 | 5877 | 0000 | IT0000 | 12000.0,ICS: E-Rate discount (85%) | 30 | 5877 ...
Search reference
space
,
.+\s\|\s\d\d\s\|\s\d\d\d\d\s\|\s\d\d\d\d\s\|\s.{2}\d\d\d\d\s\|\s\d+?\.\d+?,
ICS: Basic Maintenance | 30 | 5877 | 0000 | IT0000 | 12000.0,ICS: E-Rate discount (85%) | 30 | 5877 | 0000 | IT0000 | -10200.0,ICS: Basic Maintenance | 40 | 5877 | 0000 | IT0000 | 9000.0,ICMS: E-Rate discount (85%) | 40 | 5877 | 0000 | IT0000 | -7650.0,ICS: Basic Maintenance | 20 | 5877 | 0000 | IT0000 | 13500.0,ICS: E-Rate discount (85%) | 20 | 5877 | 0000 | IT0000 | -11475.0,ICCMS: Basic Maintenance | 70 | 5877 | 0000 | IT0000 | 12000.0,ICCMS: E-Rate discount (85%) | 70 | 5877 | 0000 | IT0000 | -10200.0,ITSM: Laptops, Desktops, Computers | 30 | 4400 | IT0000 | 720400.0
ICS: Basic Maintenance | 30 | 5877 | 0000 | IT0000 | 12000.0,ICS: E-Rate discount (85%) | 30 | 5877 | 0000 | IT0000 | -10200.0,ICS: Basic Maintenance | 40 | 5877 | 0000 | IT0000 | 9000.0,ICMS: E-Rate discount (85%) | 40 | 5877 | 0000 | IT0000 | -7650.0,ICS: Basic Maintenance | 20 | 5877 | 0000 | IT0000 | 13500.0,ICS: E-Rate discount (85%) | 20 | 5877 | 0000 | IT0000 | -11475.0,ICCMS: Basic Maintenance | 70 | 5877 | 0000 | IT0000 | 12000.0,ICCMS: E-Rate discount (85%) | 70 | 5877 | 0000 | IT0000 | -10200.0,ITSM: Laptops, Desktops, Computers | 30 | 4400 | IT0000 | 720400.0
ICS: Basic Maintenance | 30 | 5877 | 0000 | IT0000 | 12000.0,ICS: E-Rate discount (85%) | 30 | 5877 | 0000 | IT0000 | -10200.0,ICS: Basic Maintenance | 40 | 5877 | 0000 | IT0000 | 9000.0,ICMS: E-Rate discount (85%) | 40 | 5877 | 0000 | IT0000 | -7650.0,ICS: Basic Maintenance | 20 | 5877 | 0000 | IT0000 | 13500.0,ICS: E-Rate discount (85%) | 20 | 5877 | 0000 | IT0000 | -11475.0,ICCMS: Basic Maintenance | 70 | 5877 | 0000 | IT0000 | 12000.0,```
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-05-14 16:22:09

查看数据,如果您不担心管道,如果您想要9次匹配,您可以使用re.findall匹配所有值,而不是分割,并略为缩短模式:

代码语言:javascript
复制
\w+:.*?\b\d+(?:\.\d+)(?=,|$)
  • \w+:匹配1+字符和:
  • .*?匹配尽可能少字符
  • \b\d+(?:\.\d+)单词边界,匹配1+数字--一个可选的十进制部分
  • (?=,|$)在右边断言字符串的逗号或结尾

Regex演示 x- Python演示

代码语言:javascript
复制
import re
from pprint import pprint

pattern = r"\w+:.*?\b\d+(?:\.\d+)(?=,|$)"
s = "ICS: Basic Maintenance | 30 | 5877 | 0000 | IT0000 | 12000.0,ICS: E-Rate discount (85%) | 30 | 5877 | 0000 | IT0000 | -10200.0,ICS: Basic Maintenance | 40 | 5877 | 0000 | IT0000 | 9000.0,ICMS: E-Rate discount (85%) | 40 | 5877 | 0000 | IT0000 | -7650.0,ICS: Basic Maintenance | 20 | 5877 | 0000 | IT0000 | 13500.0,ICS: E-Rate discount (85%) | 20 | 5877 | 0000 | IT0000 | -11475.0,ICCMS: Basic Maintenance | 70 | 5877 | 0000 | IT0000 | 12000.0,ICCMS: E-Rate discount (85%) | 70 | 5877 | 0000 | IT0000 | -10200.0,ITSM: Laptops, Desktops, Computers | 30 | 4400 | IT0000 | 720400.0"

pprint(re.findall(pattern, s))

输出

代码语言:javascript
复制
['ICS: Basic Maintenance | 30 | 5877 | 0000 | IT0000 | 12000.0',
 'ICS: E-Rate discount (85%) | 30 | 5877 | 0000 | IT0000 | -10200.0',
 'ICS: Basic Maintenance | 40 | 5877 | 0000 | IT0000 | 9000.0',
 'ICMS: E-Rate discount (85%) | 40 | 5877 | 0000 | IT0000 | -7650.0',
 'ICS: Basic Maintenance | 20 | 5877 | 0000 | IT0000 | 13500.0',
 'ICS: E-Rate discount (85%) | 20 | 5877 | 0000 | IT0000 | -11475.0',
 'ICCMS: Basic Maintenance | 70 | 5877 | 0000 | IT0000 | 12000.0',
 'ICCMS: E-Rate discount (85%) | 70 | 5877 | 0000 | IT0000 | -10200.0',
 'ITSM: Laptops, Desktops, Computers | 30 | 4400 | IT0000 | 720400.0']

如果必须使用re.split,则可以使用捕获组保留拆分值并在逗号上拆分。

匹配中的管道的完整模式:

代码语言:javascript
复制
import re
from pprint import pprint

pattern = r"(\w+:[^|]+\|\s\d\d\s\|(?:\s\d{4}\s\|){2}\s.{2}\d{4}\s\|\s-?\d+(?:\.\d+)?),"
s = "ICS: Basic Maintenance | 30 | 5877 | 0000 | IT0000 | 12000.0,ICS: E-Rate discount (85%) | 30 | 5877 | 0000 | IT0000 | -10200.0,ICS: Basic Maintenance | 40 | 5877 | 0000 | IT0000 | 9000.0,ICMS: E-Rate discount (85%) | 40 | 5877 | 0000 | IT0000 | -7650.0,ICS: Basic Maintenance | 20 | 5877 | 0000 | IT0000 | 13500.0,ICS: E-Rate discount (85%) | 20 | 5877 | 0000 | IT0000 | -11475.0,ICCMS: Basic Maintenance | 70 | 5877 | 0000 | IT0000 | 12000.0,ICCMS: E-Rate discount (85%) | 70 | 5877 | 0000 | IT0000 | -10200.0,ITSM: Laptops, Desktops, Computers | 30 | 4400 | IT0000 | 720400.0"

pprint(list(filter(None, re.split(pattern, s))))

输出

代码语言:javascript
复制
['ICS: Basic Maintenance | 30 | 5877 | 0000 | IT0000 | 12000.0',
 'ICS: E-Rate discount (85%) | 30 | 5877 | 0000 | IT0000 | -10200.0',
 'ICS: Basic Maintenance | 40 | 5877 | 0000 | IT0000 | 9000.0',
 'ICMS: E-Rate discount (85%) | 40 | 5877 | 0000 | IT0000 | -7650.0',
 'ICS: Basic Maintenance | 20 | 5877 | 0000 | IT0000 | 13500.0',
 'ICS: E-Rate discount (85%) | 20 | 5877 | 0000 | IT0000 | -11475.0',
 'ICCMS: Basic Maintenance | 70 | 5877 | 0000 | IT0000 | 12000.0',
 'ICCMS: E-Rate discount (85%) | 70 | 5877 | 0000 | IT0000 | -10200.0',
 'ITSM: Laptops, Desktops, Computers | 30 | 4400 | IT0000 | 720400.0']

Python演示

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/67537287

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档