我有以下几句话。
words <- c("hail(0.75)", "hail0.75", "hail0.88", "hail075", "hail1.00", "hail1.75", "hail100", "hail125", "hail1.75)", "hail150", "hail175", "hail200", "hail225", "hail275", "hail450", "hail088", "hail75", "hail80", "hail88")
[1] "hail(0.75)" "hail0.75" "hail0.88" "hail075" "hail1.00" "hail1.75"
[7] "hail100" "hail125" "hail1.75)" "hail150" "hail175" "hail200"
[13] "hail225" "hail275" "hail450" "hail088" "hail75" "hail80"
[19] "hail88" 正如你所看到的,hail(0.75)重复着各种类型的错误/格式化(如hail075,hail0.75)
我怎样才能找到所有出现的hail(0.75),包括上面描述的它的变体?
我试过了
grep("hail[0,7,5]"), words, value = T) [1] "hail0.75" "hail0.88" "hail075" "hail088" "hail75"
以查找包含数字075的冰雹实例。
但是,它包括不需要的hail088,不包括需要的hail(0.75)。
发布于 2016-01-16 22:58:10
另一种选择是删除所有非数字数字,并将其用作索引:
idx <- gsub("[^[:digit:]]","",words)
words[idx=="075"]
[1] "hail(0.75)" "hail0.75" "hail075"发布于 2016-01-16 22:47:42
这就是你要找的东西吗?
> x <- c("hail(0.75)", "hail0.75", "hail0.88", "hail075", "hail1.00", "hail1.75", "hail100", "hail125", "hail1.75)", "hail150", "hail175", "hail200", "hail225", "hail275", "hail450", "hail088", "hail75", "hail80", "hail88")
> x
[1] "hail(0.75)" "hail0.75" "hail0.88" "hail075" "hail1.00"
[6] "hail1.75" "hail100" "hail125" "hail1.75)" "hail150"
[11] "hail175" "hail200" "hail225" "hail275" "hail450"
[16] "hail088" "hail75" "hail80" "hail88"而你grep:
> x[grep("^hail[[:punct:]]*0[[:punct:]]*75.*", x)]
[1] "hail(0.75)" "hail0.75" "hail075"这在假设7和5总是相邻的情况下起作用。快速解释:^表示字符串的开始,[[:punct:]]是任意标点符号,*是重复0次或更多次的前一个字符(在本例中是[[:punct:]])。
https://stackoverflow.com/questions/34828108
复制相似问题