我正在尝试为输入文本编写一个正则表达式,其中我必须提取所有带有前面消息的WARN代码。通常,警告可以是多行的,也可以不是多行的,如下所示。
[C] L1250 WARN k2 bw34 Flex - Sockets:<16>, ThreadsPerCore:<1>
[C] L1250 WARN For abcd (analytical and transactional workloads). For 12s Systems and above, should be
disabled.
[C] L1250 INFO For abcd (analytical workloads), Hyperthreading should be enabled , 8s, 12s, 14d, 34t
d above.
[C] L1250 WARN Intel's Hyperthreading on 18+ Socket system disabled. Should be disabled urgently
fix it!
[C] L1300 OK CPU governors set as recommended
[C] L1250 WARN Intel's Hyperthreading on 8+ Socket system disabled.最初,我从regex:( WARN ).*(\b|\B)开始,它捕获到单词/非单词边界的结尾,它不会捕获后面的多行(继续WARN描述)。
然后我尝试了-> WARN.+(\S\s*?)+(?=[C]),但这并没有捕获最后一个警告行,因为没有更多的C标记。

发布于 2020-04-25 00:58:08
您可以在不使用[\s\S]*或单行选项的情况下通过匹配所有不是以[C]开头的行来获取匹配项
\bWARN\h+.*(?:\R(?!\[C]).*)*说明
unicode匹配警告,前面有一个单词边界以防止成为更大的unicode匹配unicode空格chars
(?:非捕获组\R(?!\[C]).*匹配unicode换行符序列的一部分,断言该字符串不是以[C]开始的。
)* Close组和repeat 0+ times例如:
String regex = "\\bWARN\\h+.*(?:\\R(?!\\[C]).*)*";
String string = "[C] L1250 WARN k2 bw34 Flex - Sockets:<16>, ThreadsPerCore:<1>\n"
+ "[C] L1250 WARN For abcd (analytical and transactional workloads). For 12s Systems and above, should be\n"
+ " disabled.\n"
+ "[C] L1250 INFO For abcd (analytical workloads), Hyperthreading should be enabled , 8s, 12s, 14d, 34t\n"
+ " d above.\n"
+ "[C] L1250 WARN Intel's Hyperthreading on 18+ Socket system disabled. Should be disabled urgently\n"
+ " fix it!\n"
+ "[C] L1300 OK CPU governors set as recommended\n"
+ "[C] L1250 WARN Intel's Hyperthreading on 8+ Socket system disabled.";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}输出
WARN k2 bw34 Flex - Sockets:<16>, ThreadsPerCore:<1>
WARN For abcd (analytical and transactional workloads). For 12s Systems and above, should be
disabled.
WARN Intel's Hyperthreading on 18+ Socket system disabled. Should be disabled urgently
fix it!
WARN Intel's Hyperthreading on 8+ Socket system disabled.如果[C]不是边界,则另一个选项是检查下一行是否不包含WARN、INFO或OK之一
\bWARN\h+.*(?:\R(?!.*\h(?:WARN|INFO|OK)\h).*)*在Java中
String regex = "\\bWARN\\h+.*(?:\\R(?!.*\\h(?:WARN|INFO|OK)\\h).*)*";发布于 2020-04-24 20:40:38
使用选项全局和单行尝试这个正则表达式:WARN.*?(?=\[C\]|$)
这将查找从WARN开始直到下一个'C‘或输入字符串结尾的所有内容。
https://stackoverflow.com/questions/61405852
复制相似问题