对于数字,我已经这样做了:
digits <- c("0","1","2","3","4","5","6","7","8","9")发布于 2016-08-05 23:34:42
您可以使用[:punct:]检测标点符号。这会检测到
[!"\#$%&'()*+,\-./:;<=>?@\[\\\]^_`{|}~]无论是在grepexpr中
x = c("we are friends!, Good Friends!!")
gregexpr("[[:punct:]]", x)
R> gregexpr("[[:punct:]]", x)
[[1]]
[1] 15 16 30 31
attr(,"match.length")
[1] 1 1 1 1
attr(,"useBytes")
[1] TRUE或通过stringi
# Gives 4
stringi::stri_count_regex(x, "[:punct:]")请注意,,被算作标点符号。
这个问题似乎是关于获取特定标点符号的单个计数。@Joba在评论中提供了一个简洁的答案:
## Create a vector of punctuation marks you are interested in
punct = strsplit('[]?!"\'#$%&(){}+*/:;,._`|~[<=>@^-]\\', '')[[1]]它们出现的频率计数= stringi::stri_count_fixed(x,点)
修饰向量
setNames(counts, punct) 发布于 2016-08-05 23:35:16
您可以使用正则表达式。
stringi::stri_count_regex("amdfa, ad,a, ad,. ", "[:punct:]")https://en.wikipedia.org/wiki/Regular_expression
可能也会有帮助。
https://stackoverflow.com/questions/38792866
复制相似问题