我在R中有一个数据框架,其中有一个名为Title的列是一个BibTeX条目,如下所示:
={Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems},\n
author={Goldreich, Oded and Micali, Silvio and Wigderson, Avi},\n
journal={Journal of the ACM (JACM)},\n
volume={38},\n
number={3},\n
pages={690--728},\n
year={1991},\n
publisher={ACM New York, NY, USA}\n}我只需要提取BibTeX引用的标题,即={之后和下一个}之前的字符串。
在这个例子中,输出应该是:
Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems我需要对数据帧中的所有行执行此操作。并非所有行都有相同数量的BibTeX字段,因此regex必须忽略第一个}之后的所有内容。
我目前正在尝试sub(".*\\={\\}\\s*(.+?)\\s*\\|.*$", "\\1", data$Title),并会见了TRE pattern compilation error 'Invalid contents of {}'
我该怎么做?
发布于 2022-06-28 18:57:42
一种可能的解决方案,使用stringr::str_extract和查找:
library(stringr)
str_extract(s, "(?<=\\{)[^}]+(?=\\})")
#> [1] "Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems"发布于 2022-06-28 18:53:37
请注意,{字符是一个特殊的正则元字符,它需要转义。
要匹配大括号之间的任何字符串,您需要一个基于否定字符类(否定括号表达式)的模式,如\{([^{}]*)}。
您可以使用
sub(".*?=\\{([^{}]*)}.*", "\\1", df$Title)Title <- c("={Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems},\n author={Goldreich, Oded and Micali, Silvio and Wigderson, Avi},\n journal={Journal of the ACM (JACM)},\n volume={38},\n number={3},\n pages={690--728},\n year={1991},\n publisher={ACM New York, NY, USA}\n}")
sub(".*?=\\{([^{}]*)}.*", "\\1", Title)输出:
[1] "Proofs that yield nothing but their validity or all languages in NP have zero-knowledge proof systems"图案细节
.*? -任何零或多个字符,尽可能少=\\{ -一个={子字符串([^{}]*) -第1组(\1):除大括号外的任何零个或多个字符} -一个}字符(它不是特殊的,不需要逃避).* -剩下的字符串.https://stackoverflow.com/questions/72791750
复制相似问题