什么是正确的正则表达式从括号内从下面的字符串中提取字符串“(过程)”-or
输入字符串示例如下
氟替胺醇(18F)与脑of的正电子发射断层扫描(程序)
另一个例子
泌尿道感染预防(程序)
可能的办法是:
其他字符串也可以(提取不同的“标记”)
[1] "Xanthoma of eyelid (disorder)" "Ventricular tachyarrhythmia (disorder)"
[3] "Abnormal urine odor (finding)" "Coloboma of iris (disorder)"
[5] "Macroencephaly (disorder)" "Right main coronary artery thrombosis (disorder)"(寻求一般的正则表达式)(或者R中的解更好)
发布于 2017-02-09 21:35:24
sub可以用正确的正则表达式来完成这一任务。
Text = c("Positron emission tomography using flutemetamol (18F)
with computed tomography of brain (procedure)",
"Urinary tract infection prophylaxis (procedure)",
"Xanthoma of eyelid (disorder)",
"Ventricular tachyarrhythmia (disorder)",
"Abnormal urine odor (finding)",
"Coloboma of iris (disorder)",
"Macroencephaly (disorder)",
"Right main coronary artery thrombosis (disorder)")
sub(".*\\((.*)\\).*", "\\1", Text)
[1] "procedure" "procedure" "disorder" "disorder" "finding" "disorder"
[7] "disorder" "disorder"增编:对regex的详细解释
问题要求查找字符串中最后一组括号的内容。这个表达式有点混乱,因为它包含两种不同的括号用法,一种是表示正在处理的字符串中的括号,另一种是设置一个“捕获组”,即我们指定表达式应该返回哪个部分的方式。这个表达式由五个基本单位组成:
1. Initial .* - matches everything up to the final open parenthesis.
Note that this is relying on "greedy matching"
2. \\( ... \\) - matches the final set of parentheses.
Because ( by itself means something else, we need to "escape" the
parentheses by preceding them with \. That is we want the regular
expression to say \( ... \). However, the way R interprets strings,
if we just typed \( and \), R would interpret the \ as escaping the (
and so interpret this as just ( ... ). So we escape the backslash.
R will interpret \\( ... \\) as \( ... \) meaning the literal
characters ( & ).
3. ( ... ) Inside the pair in part 2
This is making use of the special meaning of parentheses. When we
enclose an expression in parentheses, whatever value is inside them
will be stored in a variable for later use. That variable is called
\1, which is what was used in the substitution pattern. Again, is
we just wrote \1, R would interpret it as if we were trying to escape
the 1. Writing \\1 is interpreted as the character \ followed by 1,
i.e. \1.
4. Central .* Inside the pair in part 3
This is what we are looking for, all characters inside the parentheses.
5. Final .*
This is in the expression to match any characters that may follow the
final set of parentheses. 子函数将使用它替换匹配模式(在本例中,字符串中的所有字符)为替换模式\1,即包含第一个捕获组(仅在本例中)中的内容的变量的内容--最后括号中的内容。
发布于 2017-02-09 21:34:40
如果它是字符串的最后一部分,那么这个正则表达式将完成它:
/\(([^()]*)\)$/解释:查找一个打开的(,并匹配它之间的所有内容,即不是(或),然后在字符串的末尾有一个)。
发布于 2021-03-08 22:50:44
实际上,您可以使用以下方法提取字符串末尾嵌套括号内的文本:
x <- c("FELON IN POSSESSION OF AMMUNITION (ACTUAL POSSESSION) (79023)",
"FAIL TO DISPLAY REGISTRATION - POSSESSION REQUIRED (320.0605(1))")
sub(".*(\\(((?:[^()]++|(?1))*)\\))$", "\\2", x, perl=TRUE)详细信息
.* -除行中断字符以外的任何零或多个字符,尽可能多。(\(((?:[^()]++|(?1))*)\)) -捕获第1组(进行递归所必需的):\( -a ( char((?:[^()]++|(?1))*) -捕获组2(我们的值):除(和)或整个组1模式外,任何一个或多个字符的出现都是零次或多次。\) -a ) char$ -字符串的末端。因此,当匹配时,整个字符串被替换为组2的值。如果没有匹配,则字符串保持原来的状态。
https://stackoverflow.com/questions/42147203
复制相似问题