我有大字符串的十进制和NA列。
df <- data.frame(
A_gsr =c("2.752,2.752,2.752,2.752,2.752,2.752,2.752,2.911,2.911,3.555",
"2.999,2.999,2.999,2.752,2.752,2.752,2.752"),
B_gsr = c("1.34,1.34,1.34,1.55,1.55,1.55,1.55,1.55,1.55,1.55",
"1.56,1.56,1.56,1.55,1.55,1.55,1.55,NA,NA,NA,NA,1.34,1.34,1.34"),
C_gsr = c("NA,NA,NA,0.147,0.147,0.147,0.147,0.147,NA",
"0.146,0.146,0.146,0.146,0.146,0.146,0.146,0.146,0.146,0.146")
)我要删除所有的长程副本。使用gsub和backreference,我非常接近我想要的内容:
lapply(df[,1:3], function(x) gsub("((\\d\\.\\d+,)|(NA,))\\1+", "\\1", x))
$A_gsr
[1] "2.752,2.911,3.555" "2.999,2.752,2.752"
$B_gsr
[1] "1.34,1.55,1.55" "1.56,1.55,NA,1.34,1.34"
$C_gsr
[1] "NA,0.147,NA" "0.146,0.146"然而,距离不够近--仍然有一些长跑的哑弹,都在字符串的末尾.的预期结果是:
$A_gsr
[1] "2.752,2.911,3.555" "2.999,2.752"
$B_gsr
[1] "1.34,1.55" "1.56,1.55,NA,1.34"
$C_gsr
[1] "NA,0.147,NA" "0.146"发布于 2021-03-25 13:36:53
您可以使用
lapply(df[,1:3], function(x) gsub("\\b(\\d+\\.\\d+|NA)(?:,\\1)+\\b", "\\1", x))
## => $A_gsr
## [1] "2.752,2.911,3.555" "2.999,2.752"
##
## $B_gsr
## [1] "1.34,1.55" "1.56,1.55,NA,1.34"
##
## $C_gsr
## [1] "NA,0.147,NA" "0.146" 详细信息
\b -一个单词边界(\d+\.\d+|NA) -第1组:一个或多个数字,.,一个或多个数字,或NA字符串(?:,\1)+ -一个或多个逗号的重复和第1组中的值\b -一个单词边界https://stackoverflow.com/questions/66800487
复制相似问题