文章/答案/技术大牛

发布

社区首页 >问答首页 >从递归pcregrep搜索中删除多行

问从递归pcregrep搜索中删除多行
EN

Stack Overflow用户

提问于 2015-05-20 03:04:53

回答 1查看 345关注 0票数 0

在我的项目中的每个文件的标题中，我有一个由VCC生成的注释块，其中包含该文件的修订历史记录。我们已经离开了VCC，不再想在文件中保留修订历史，因为它已经过时了。

我目前有一个搜索pcregrep搜索，它返回我正在寻找的确切结果：

pcregrep -rM '(\$Rev)(?s)(.+?(?=\*\*))' *

我曾尝试将结果通过管道传输到xargs sed中，并尝试从文件中删除所有返回的行，但得到了各种错误，包括“文件名太长”。

我想删除整个区块

sed

pcregrep

regex

回答 1

Stack Overflow用户

发布于 2015-05-20 08:02:27

由于您正在讨论C++文件，因此您不能只查找注释，

您必须解析注释，因为文字字符串可能包含注释

分隔符。

这之前已经做过了，重复发明轮子是没有用的。

一个简单的grep并不能解决这个问题。您需要一个简单的宏或C#控制台应用程序

它有更好的性能。

如果你想走这条路，下面是为你准备的正则表达式。

每个匹配将匹配组1(注释块)或组2(非注解)。

您需要通过附加每个匹配的结果来重写一个新字符串。

或者，使用回调函数进行替换。

每次匹配Group2时，只需原封不动地附加它(如果是回调，则返回它)。

当它与组1匹配时，您必须在组1的

内容来查看注释块是否包含修订信息。

如果它包含它，不要附加(如果是回调，则返回"“)它的内容。

如果没有，只需原封不动地附加它。

所以，这是一个两步的过程。

伪代码：

// here read in the source sting from file.
string strSrc = ReadFile( name );
string strNew = "";
Matcher CmtMatch, RevMatch;

while ( GloballyFind( CommentRegex, strSrc, CmtMatch ) )  
{
   if ( CmtMatch.matched(1) )
   {
       string strComment = Match.value(1);
       if ( FindFirst( RevisionRegex, strComment, RevMatch ) )
           continue;
        else
            strNew += strComment;
    }
    else
       strNew += Match.value(2);
 }
 // here write out the new string.

这可以通过使用回调函数的ReplaceAll()来完成，如果

使用宏语言。逻辑放在回调中。

这并不像看起来那么难，但如果你想做好它，我会这样做。

然后，嘿，你有了一个很棒的实用程序可以再次使用。

下面是经过格式化和压缩后的正则表达式expanded=。

(使用RegexFormat 6 (Unicode)构建)

 # raw:  ((?:(?:^\h*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:\h*\n(?=\h*(?:\n|/\*|//)))?|//(?:[^\\]|\\\n?)*?(?:\n(?=\h*(?:\n|/\*|//))|(?=\n))))+)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\\s]*)
 # delimited:  /((?:(?:^\h*)?(?:\/\*[^*]*\*+(?:[^\/*][^*]*\*+)*\/(?:\h*\n(?=\h*(?:\n|\/\*|\/\/)))?|\/\/(?:[^\\]|\\\n?)*?(?:\n(?=\h*(?:\n|\/\*|\/\/))|(?=\n))))+)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^\/"'\\\s]*)/     
 # Dbl-quoted: "((?:(?:^\\h*)?(?:/\\*[^*]*\\*+(?:[^/*][^*]*\\*+)*/(?:\\h*\\n(?=\\h*(?:\\n|/\\*|//)))?|//(?:[^\\\\]|\\\\\\n?)*?(?:\\n(?=\\h*(?:\\n|/\\*|//))|(?=\\n))))+)|(\"(?:\\\\[\\S\\s]|[^\"\\\\])*\"|'(?:\\\\[\\S\\s]|[^'\\\\])*'|[\\S\\s][^/\"'\\\\\\s]*)"     
 # Sing-quoted: '((?:(?:^\h*)?(?:/\*[^*]*\*+(?:[^/*][^*]*\*+)*/(?:\h*\n(?=\h*(?:\n|/\*|//)))?|//(?:[^\\\]|\\\\\n?)*?(?:\n(?=\h*(?:\n|/\*|//))|(?=\n))))+)|("(?:\\\[\S\s]|[^"\\\])*"|\'(?:\\\[\S\s]|[^\'\\\])*\'|[\S\s][^/"\'\\\\\s]*)'


    (                                # (1 start), Comments 
         (?:
              (?: ^ \h* )?                     # <- To preserve formatting
              (?:
                   /\*                              # Start /* .. */ comment
                   [^*]* \*+
                   (?: [^/*] [^*]* \*+ )*
                   /                                # End /* .. */ comment
                   (?:                              # <- To preserve formatting 
                        \h* \n                                      
                        (?=
                             \h*                  
                             (?: \n | /\* | // )
                        )
                   )?
                |  
                   //                               # Start // comment
                   (?: [^\\] | \\ \n? )*?           # Possible line-continuation
                   (?:                              # End // comment
                        \n                               
                        (?=                              # <- To preserve formatting
                             \h*                          
                             (?: \n | /\* | // )
                        )
                     |  (?= \n )
                   )
              )
         )+                               # Grab multiple comment blocks if need be
    )                                # (1 end)

 |                                 ## OR

    (                                # (2 start), Non - comments 
         "
         (?: \\ [\S\s] | [^"\\] )*        # Double quoted text
         "
      |  '
         (?: \\ [\S\s] | [^'\\] )*        # Single quoted text
         ' 
      |  [\S\s]                           # Any other char
         [^/"'\\\s]*                      # Chars which doesn't start a comment, string, escape,
                                          # or line continuation (escape + newline)
    )                                # (2 end)

如果你想要更简单的东西-

这是相同的正则表达式，没有多个注释块捕获或格式保留。同样的分组和替换原则也适用。

 # Raw:  (/\*[^*]*\*+(?:[^/*][^*]*\*+)*/|//(?:[^\\]|\\\n?)*?\n)|("(?:\\[\S\s]|[^"\\])*"|'(?:\\[\S\s]|[^'\\])*'|[\S\s][^/"'\\]*)


     (                                # (1 start), Comments 
          /\*                              # Start /* .. */ comment
          [^*]* \*+
          (?: [^/*] [^*]* \*+ )*
          /                                # End /* .. */ comment
       |  
          //                               # Start // comment
          (?: [^\\] | \\ \n? )*?           # Possible line-continuation
          \n                               # End // comment
     )                                # (1 end)
  |  
     (                                # (2 start), Non - comments 
          "
          (?: \\ [\S\s] | [^"\\] )*        # Double quoted text
          "
       |  '
          (?: \\ [\S\s] | [^'\\] )*        # Single quoted text
          ' 
       |  [\S\s]                           # Any other char
          [^/"'\\]*                        # Chars which doesn't start a comment, string, escape,
                                           # or line continuation (escape + newline)
     )                                # (2 end)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/30334078

复制

相似问题

问从递归pcregrep搜索中删除多行
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从递归pcregrep搜索中删除多行EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从递归pcregrep搜索中删除多行
EN