我在数据集中有一个单词列表
我使用R基找到前10个常用单词
排序(表(单词),递减=真)1:10
我的问题是:如何使用字符串函数获得十大常用单词?
str_sort(表(单词),递减=真)1:10
上面的代码没有给我提供预期的结果。有什么想法吗?
发布于 2021-10-17 20:50:07
顾名思义,str_sort用于对字符元素进行排序。它被记录在?str_sort中。
命令或排序字符向量。
table的输出是数字输出的命名vector。如果我们正在寻找tidyverse版本,请获取count,然后使用slice_max获取前10次频率计数。
library(dplyr)
tibble(words) %>%
count(words) %>%
slice_max(n = 10, order_by = n)主要的区别在于,sorting与字符元素不同,而数值对应项,即在进行排序之前,str_sort将数字输出强制到character class,这可能有不同的输出。
> str_sort(c(10, 4, 5))
[1] "10" "4" "5"
> sort(c(10, 4, 5))
[1] 4 5 10包stringr不是用来排序数字输入的。字符串手册本身的标题表明
通用字符串操作的
简单一致包装器
在forcats中有一个选项可以这样做
library(forcats)
fct_count(words, sort = TRUE)$n[1:10]发布于 2021-10-17 20:55:23
更新:
在本例中,我们使用了sort函数的count参数:
#example:
mytext = c("This","is","a","test","for","count","of","the","words","The","words","have","been","written","very","randomly","so","that","the","test","can","be","for","checking","the","count")
library(tibble)
library(dplyr)
tibble(mytext) %>%
group_by(mytext) %>%
count(sort = TRUE) %>%
ungroup() %>%
slice_max(n, n=10) mytext n
<chr> <int>
1 the 3
2 count 2
3 for 2
4 test 2
5 words 2
6 a 1
7 be 1
8 been 1
9 can 1
10 checking 1
11 have 1
12 is 1
13 of 1
14 randomly 1
15 so 1
16 that 1
17 The 1
18 This 1
19 very 1
20 written 1我们可以从str_count包中使用stringr:参见这里的一个示例:
library(tidyverse)
tibble(words) %>%
mutate(n = str_count(words)) %>%
slice_max(n = 10, order_by = n)产出:
words n
<chr> <int>
1 appropriate 11
2 environment 11
3 opportunity 11
4 responsible 11
5 department 10
6 difference 10
7 experience 10
8 individual 10
9 particular 10
10 photograph 10
11 television 10
12 understand 10
13 university 10https://stackoverflow.com/questions/69608462
复制相似问题