我需要一个函数来提取任何类型的括号ie (),[],{}和中间的信息。我创建了它,并让它做我想做的事情,但我得到了一个恼人的警告,我真的不知道它是什么意思。我希望恼人的警告消失,要么修复我的代码错误,要么隐藏警告。我用suppressWarnings()尝试了一下,但它不起作用,因为我不认为我正确地使用了它。
此函数使用regmatches,需要R版本2.14或更高版本
下面是下面的函数和一个重现警告的示例。谢谢你的帮助。
################
# THE FUNCTION #
################
bracketXtract <- function(text, bracket = "all", include.bracket = TRUE) {
bracketExtract <- if (include.bracket == FALSE) {
function(Text, bracket) {
switch(bracket,
square = lapply(Text, function(j) gsub("[\\[\\]]", "",
regmatches(j, gregexpr("\\[.*?\\]", j))[[1]],
perl = TRUE)),
round = lapply(Text, function(j) gsub("[\\(\\)]", "",
regmatches(j, gregexpr("\\(.*?\\)", j))[[1]])),
curly = lapply(Text, function(j) gsub("[\\{\\}]", "",
regmatches(j, gregexpr("\\{.*?\\}", j))[[1]])),
all = { P1 <- lapply(Text, function(j) gsub("[\\[\\]]", "",
regmatches(j, gregexpr("\\[.*?\\]", j))[[1]],
perl = TRUE))
P2 <- lapply(Text, function(j) gsub("[\\(\\)]", "",
regmatches(j, gregexpr("\\(.*?\\)", j))[[1]]))
P3 <- lapply(Text, function(j) gsub("[\\{\\}]", "",
regmatches(j, gregexpr("\\{.*?\\}", j))[[1]]))
apply(cbind(P1, P2, P3), 1, function(x) rbind(as.vector(unlist(x))))
})
}
} else {
function(Text, bracket) {
switch(bracket,
square = lapply(Text, function(j) regmatches(j,
gregexpr("\\[.*?\\]", j))[[1]]),
round = lapply(Text, function(j) regmatches(j,
gregexpr("\\(.*?\\)", j))[[1]]),
curly = lapply(Text, function(j) regmatches(j,
gregexpr("\\{.*?\\}", j))[[1]]),
all = { P1 <- lapply(Text, function(j) regmatches(j,
gregexpr("\\[.*?\\]", j))[[1]])
P2 <- lapply(Text, function(j) regmatches(j,
gregexpr("\\(.*?\\)", j))[[1]])
P3 <- lapply(Text, function(j) regmatches(j,
gregexpr("\\{.*?\\}", j))[[1]])
apply(cbind(P1, P2, P3), 1, function(x) rbind(as.vector(unlist(x))))
})
}
}
if (length(text) == 1) {
unlist(lapply(text, function(x) bracketExtract(Text = text,
bracket = bracket)))
} else {
sapply(text, function(x) bracketExtract(Text = text,
bracket = bracket))
}
}
##################
# TESTING IT OUT #
##################
j <- "What kind of cheese isn't your cheese? {wonder} Nacho cheese! [groan] (Laugh)"
bracketXtract(j, 'round')
bracketXtract(j, 'round', include.bracket = FALSE)
examp2<-data.frame(var1=1:4)
examp2$text<-as.character(c("I love chicken [unintelligible]!", "Me too! (laughter) It's so good.[interupting]",
"Yep it's awesome {reading}.", "Agreed."))
#=================================#
# HERE"S WHERE THE WARNINGS COME: #
#=================================#
examp2$text2<-bracketXtract(examp2$text, 'round')
examp2
examp2$text2<-bracketXtract(examp2$text, 'all')
examp2发布于 2011-12-24 08:25:33
也许这个函数更简单一些?或者至少更紧凑。
bracketXtract <-
function(txt, br = c("(", "[", "{", "all"), with=FALSE)
{
br <- match.arg(br)
left <- # what pattern are we looking for on the left?
if ("all" == br) "\\(|\\{|\\["
else sprintf("\\%s", br)
map <- # what's the corresponding pattern on the right?
c(`\\(`="\\)", `\\[`="\\]", `\\{`="\\}",
`\\(|\\{|\\[`="\\)|\\}|\\]")
fmt <- # create the appropriate regular expression
if (with) "(%s).*?(%s)"
else "(?<=%s).*?(?=%s)"
re <- sprintf(fmt, left, map[left])
regmatches(txt, gregexpr(re, txt, perl=TRUE)) # do it!
}不需要lapply;正则表达式函数是以这种方式矢量化的。使用嵌套的圆括号会失败;如果这很重要,那么正则表达式可能不是一个好的解决方案。现在我们开始行动了:
> txt <- c("I love chicken [unintelligible]!",
+ "Me too! (laughter) It's so good.[interupting]",
+ "Yep it's awesome {reading}.",
+ "Agreed.")
> bracketXtract(txt, "all")
[[1]]
[1] "unintelligible"
[[2]]
[1] "laughter" "interupting"
[[3]]
[1] "reading"
[[4]]
character(0)这可以毫不费力地安装到data.frame中。
> examp2 <- data.frame(var1=1:4)
> examp2$text <- c("I love chicken [unintelligible]!",
+ "Me too! (laughter) It's so good.[interupting]",
+ "Yep it's awesome {reading}.", "Agreed.")
> examp2$text2<-bracketXtract(examp2$text, 'all')
> examp2
var1 text text2
1 1 I love chicken [unintelligible]! unintelligible
2 2 Me too! (laughter) It's so good.[interupting] laughter, interupting
3 3 Yep it's awesome {reading}. reading
4 4 Agreed. 您看到的警告与试图将矩阵插入数据框有关。我认为答案是“不要那样做”。
> df = data.frame(x=1:2)
> df$y = matrix(list(), 2, 2)
> df
x y
1 1 NULL
2 2 NULL
Warning message:
In format.data.frame(x, digits = digits, na.encode = FALSE) :
corrupt data frame: columns will be truncated or padded with NAs发布于 2011-12-24 08:58:20
我的想法是制作6个(隐式矢量化)助手函数,但我将研究Martin的代码,因为他在这方面比我强得多:
rm.curlybkt.no <-function(x) gsub("(\\{).*(\\})", "\\1\\2", x, perl=TRUE)
rm.rndbkt.no <- function(x) gsub("(\\().*(\\))", "\\1\\2", x, perl=TRUE)
rm.sqrbkt.no <- function(x) gsub("(\\[).*(\\])", "\\1\\2", x, perl=TRUE)
rm.rndbkt.in <- function(x) gsub("\\(.*\\)", "", x)
rm.curlybkt.in <- function(x) gsub("\\{.*\\}", "", x)
rm.sqrbkt.in <- function(x) gsub("\\[.*\\]", "", x)发布于 2011-12-24 22:18:23
假设括号不是嵌套的,并且我们有以下测试数据:
x <- c("a (bb) [ccc]{d}e", "x[a]y")然后在gsubfn中使用strapply,我们得到了这个两行的解决方案,它首先将所有的圆括号和方括号转换为大括号,然后进行处理:
library(gsubfn)
xx <- chartr("[]()", "{}{}", x)
s <- strapply(xx, "{([^}]*)}", c)以上结果如下所示:
> s
[[1]]
[1] "bb" "ccc" "d"
[[2]]
[1] "a"https://stackoverflow.com/questions/8621066
复制相似问题