我有文本数据(以R),并希望用数据帧中的其他字符替换一些字符。我认为这将是一个简单的任务,使用空格上的str拆分,并创建一个向量,然后我可以使用匹配(%in%),然后可以粘贴回一起。但后来我想到了标点符号。句子的最后一个词和结尾的标点符号之间没有空格。
我想,可能有一种更简单的方法来实现我想要的东西,而不是成为我的代码的复杂混乱。我想知道这个问题的方向。
#Character String
x <- "I like 346 ice cream cones. They're 99 percent good! I ate 46."
#Replacement Values Dataframe
symbol text
1 "346" "three hundred forty six"
2 "99" "ninety nine"
3 "46" "forty six"
#replacement dataframe
numDF <-
data.frame(symbol = c("346","99", "46"),
text = c("three hundred forty six", "ninety nine","forty six"),
stringsAsFactors = FALSE)期望的结果:
[1] "I like three hundred forty six ice cream cones. They're ninety nine percent good! You ate forty six?")编辑:我最初的标题是这个有条件的gsub,因为这在我看来是什么样子,即使不涉及gsub。
发布于 2012-01-02 18:00:00
也许这是由乔希·奥布赖恩的回答所启发的,是吗?
x <- "I like 346 ice cream cones. They're 99 percent good! I ate 46."
numDF <- structure(c("346", "99", "46", "three hundred forty six", "ninety nine",
"forty six"), .Dim = c(3L, 2L), .Dimnames = list(c("1", "2",
"3"), c("symbol", "text")))
pat <- paste(numDF[,"symbol"], collapse="|")
repeat {
m <- regexpr(pat, x)
if(m==-1) break
sym <- regmatches(x,m)
regmatches(x,m) <- numDF[match(sym, numDF[,"symbol"]), "text"]
}
x发布于 2012-01-02 17:26:54
此解决方案在同名包中使用gsubfn:
library(gsubfn)
(pat <- paste(numDF$symbol, collapse="|"))
# [1] "346|99|46"
gsubfn(pattern = pat,
replacement = function(x) {
numDF$text[match(x, numDF$symbol)]
},
x)
[1] "I like three hundred forty six ice cream cones. They're ninety nine percent good! I ate forty six."发布于 2012-01-02 17:50:19
您可以在空格或单词边界上拆分(这将在单词和标点符号之间匹配):
> x
[1] "I like 346 ice cream cones. They're 99 percent good! I ate 46."
> strsplit(x, split='\\s|\\>|\\<')
[[1]]
[1] "I" "like" "346" "ice" "cream" "cones" "."
[8] "" "They" "'re" "99" "percent" "good" "!"
[15] "" "I" "ate" "46" "." 然后你就可以做你的替代品了。
https://stackoverflow.com/questions/8703398
复制相似问题