我正在尝试删除数据帧的字符串元素中的“+”字符。但我找不到出路。
下面是数据帧。
txtdf <- structure(list(ID = 1:9, Var1 = structure(c(1L, 1L, 1L, 1L, 4L,
5L, 5L, 2L, 3L), .Label = c("government", "parliament", "parliment",
"poli+tician", "politician"), class = "factor")), .Names = c("ID",
"Var1"), class = "data.frame", row.names = c(NA, -9L))
# ID Var1
# 1 government
# 2 government
# 3 government
# 4 government
# 5 poli+tician
# 6 politician
# 7 politician
# 8 parliament
# 9 parliment我尝试了两种方法,它们都没有给出预期的结果:
Way1
txtdf <- gsub("[:punct:]","", txtdf)
# [1] "goverme" "goverme" "goverme" "goverme" "oli+iia" "oliiia" "oliiia"
# [8] "arliame" "arlime" 我不明白这是怎么回事。我希望将'+‘字符替换为仅用于第5个元素的值,但所有元素都按上述方式编辑。
Way2
txtdf<-gsub("*//+","",txtdf)
# [1] "government" "government" "government" "government" "poli+tician"
# [6] "politician" "politician" "parliament" "parliment" 这里一点变化也没有。我想我试过的是,我试图用双斜杠来转义+字符。
发布于 2017-05-14 16:04:29
只需用fixed = TRUE替换它(不需要使用正则表达式),但是您必须通过指定列名来替换data.frame的每个“列”:
txtdf <- data.frame(job = c("government", "poli+tician", "parliament"))
txtdf给出
job
1 government
2 poli+tician
3 parliament现在替换"+":
txtdf$job <- gsub("+", "", txtdf$job, fixed = TRUE)
txtdf结果是:
job
1 government
2 politician
3 parliament发布于 2017-05-14 16:39:57
您需要转义您的加号,"+“有一个特殊的含义(它是一个量词),当涉及到正则表达式时,因此不能被视为标点符号,从文档:?regex
"+“前面的项目将匹配一次或多次。
为了匹配这些特殊的字符,你需要转义这些字符,这样它们的意思就可以被字面理解,因此它们的特殊意义就不会被翻译出来。在R中,您需要两个反斜杠()来转义。所以在你的例子中,这应该是这样的:
gsub("\\+","",df$job)以上运行将从数据中删除所有加号,从而给出所需的结果。
因此,假设您的df是:
df <- data.frame(job = c("government", "poli+tician","politician", "parliament"))那么,您的输出将是:
> gsub("\\+","",df$job)
[1] "government" "politician" "politician"
[4] "parliament"https://stackoverflow.com/questions/43965949
复制相似问题