首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Regex:如何从最后一个括号中提取文本

Regex:如何从最后一个括号中提取文本
EN

Stack Overflow用户
提问于 2017-02-09 21:27:48
回答 3查看 3.1K关注 0票数 3

什么是正确的正则表达式从括号内从下面的字符串中提取字符串“(过程)”-or

输入字符串示例如下

氟替胺醇(18F)与脑of的正电子发射断层扫描(程序)

另一个例子

泌尿道感染预防(程序)

可能的办法是:

  • 转到文本的末尾,查找第一个开始的括号,并从该位置取子集到文本的末尾。
  • 从文本的开头,标识最后的'(‘char,并将该位置作为子字符串结束

其他字符串也可以(提取不同的“标记”)

代码语言:javascript
复制
[1] "Xanthoma of eyelid (disorder)"                    "Ventricular tachyarrhythmia (disorder)"          
[3] "Abnormal urine odor (finding)"                    "Coloboma of iris (disorder)"                     
[5] "Macroencephaly (disorder)"                        "Right main coronary artery thrombosis (disorder)"

(寻求一般的正则表达式)(或者R中的解更好)

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2017-02-09 21:35:24

sub可以用正确的正则表达式来完成这一任务。

代码语言:javascript
复制
Text = c("Positron emission tomography using flutemetamol (18F) 
    with computed tomography of brain (procedure)",
    "Urinary tract infection prophylaxis (procedure)", 
    "Xanthoma of eyelid (disorder)",                    
    "Ventricular tachyarrhythmia (disorder)",          
    "Abnormal urine odor (finding)",                    
    "Coloboma of iris (disorder)",                   
    "Macroencephaly (disorder)",                        
    "Right main coronary artery thrombosis (disorder)")
sub(".*\\((.*)\\).*", "\\1", Text)
[1] "procedure" "procedure" "disorder"  "disorder"  "finding"   "disorder" 
[7] "disorder"  "disorder"

增编:对regex的详细解释

问题要求查找字符串中最后一组括号的内容。这个表达式有点混乱,因为它包含两种不同的括号用法,一种是表示正在处理的字符串中的括号,另一种是设置一个“捕获组”,即我们指定表达式应该返回哪个部分的方式。这个表达式由五个基本单位组成:

代码语言:javascript
复制
1. Initial .*   - matches everything up to the final open parenthesis. 
   Note that this is relying on "greedy matching"
2. \\(   ...    \\)   - matches the final set of parentheses. 
   Because ( by itself means something else,  we need to "escape" the 
   parentheses by preceding them with \.  That is we want the regular
   expression to say   \(  ...  \).  However, the way R interprets strings,
   if we just typed \( and \),  R would interpret the \ as escaping the (
   and so interpret this as just ( ... ).  So we escape the backslash.  
   R will interpret   \\(  ... \\)      as \( ... \) meaning the literal
   characters ( & ). 
3. ( ... )       Inside the pair in part 2
   This is making use of the special meaning of parentheses.  When we
   enclose an expression in parentheses, whatever value is inside them 
   will be stored in a variable for later use. That variable is called 
   \1,  which is what was used in the substitution pattern. Again, is 
   we just wrote \1, R would interpret it as if we were trying to escape
   the 1. Writing \\1 is interpreted as the character \ followed by 1, 
   i.e. \1.
4. Central .*    Inside the pair in part 3
   This is what we are looking for,  all characters inside the parentheses.
5. Final   .*
   This is in the expression to match any characters that may follow the 
   final set of parentheses. 

子函数将使用它替换匹配模式(在本例中,字符串中的所有字符)为替换模式\1,即包含第一个捕获组(仅在本例中)中的内容的变量的内容--最后括号中的内容。

票数 3
EN

Stack Overflow用户

发布于 2017-02-09 21:34:40

如果它是字符串的最后一部分,那么这个正则表达式将完成它:

代码语言:javascript
复制
/\(([^()]*)\)$/

解释:查找一个打开的(,并匹配它之间的所有内容,即不是(),然后在字符串的末尾有一个)

https://regex101.com/r/cEsQtf/1

票数 5
EN

Stack Overflow用户

发布于 2021-03-08 22:50:44

实际上,您可以使用以下方法提取字符串末尾嵌套括号内的文本:

代码语言:javascript
复制
x <- c("FELON IN POSSESSION OF AMMUNITION (ACTUAL POSSESSION) (79023)",
"FAIL TO DISPLAY REGISTRATION - POSSESSION REQUIRED (320.0605(1))")
sub(".*(\\(((?:[^()]++|(?1))*)\\))$", "\\2", x, perl=TRUE)

请参阅在线R演示regex演示

详细信息

  • .* -除行中断字符以外的任何零或多个字符,尽可能多。
  • (\(((?:[^()]++|(?1))*)\)) -捕获第1组(进行递归所必需的):
    • \( -a ( char
    • ((?:[^()]++|(?1))*) -捕获组2(我们的值):除()或整个组1模式外,任何一个或多个字符的出现都是零次或多次。
    • \) -a ) char

  • $ -字符串的末端。

因此,当匹配时,整个字符串被替换为组2的值。如果没有匹配,则字符串保持原来的状态。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42147203

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档