我正在看一本小说,我想在整本书中寻找人物名字的出现,一些人物有不同的名字。例如,字符"Sissy Jupe“由"Sissy”和"Jupe“组成。我想要将两行单词计数合并为一行,这样我就可以看到"Sissy Jupe“的计数。
我已经研究过使用sum、rbind、merge和其他使用留言板的方法,但似乎都不起作用。有很多很好的例子,但它们都不起作用。
library(tidyverse)
library(gutenbergr)
library(tidytext)
ht <- gutenberg_download(786)
ht_chap <- ht %>%
mutate(linenumber = row_number(),
chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
ignore_case = TRUE))))
tidy_ht <- ht_chap %>%
unnest_tokens(word, text) %>%
mutate(word = str_extract(word, "[a-z']+")) # preserves online letters; removes _)
ht_count <- tidy_ht %>%
group_by(chapter) %>%
count(word, sort = TRUE) %>%
ungroup %>%
complete(chapter, word,
fill = list(n = 0))
gradgrind <- filter(ht_count, word == "gradgrind")
bounderby <- filter (ht_count, word == "bounderby")
sissy <- filter (ht_count, word == "sissy")
## TEST
sissy_jupe <- ht_count %>%
filter(word %in% c("sissy", "jupe"))我想要一个名为"sissy_jupe“的单个"word”项,它与章节中的n相符。这很接近,但不是它。
# A tibble: 76 x 3
chapter word n
<int> <chr> <dbl>
1 0 jupe 0
2 0 sissy 1
3 1 jupe 0
4 1 sissy 0
5 2 jupe 5
6 2 sissy 9
7 3 jupe 3
8 3 sissy 1
9 4 jupe 1
10 4 sissy 0
# … with 66 more rows发布于 2019-06-15 06:41:15
下面的代码应该会得到所需的输出。
library(tidyverse)
df %>% group_by(chapter) %>%
mutate(n = sum(n),
word = paste(word, collapse="_")) %>%
distinct(chapter, .keep_all = T)发布于 2019-06-15 08:16:23
欢迎来到stackoverflow Tom。这是一个想法:
基本上,(1)在整齐的tibble中找到"sissy“或"jupe”并替换为"sissy_jupe",(2)像您一样创建ht_count,(3)打印结果:
library(tidyverse)
library(gutenbergr)
library(tidytext)
ht <- gutenberg_download(786)
ht_chap <- ht %>%
mutate(linenumber = row_number(),
chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
ignore_case = TRUE))))
tidy_ht <- ht_chap %>%
unnest_tokens(word, text) %>%
mutate(word = str_extract(word, "[a-z']+")) # preserves online letters; removes _)
# NEW CODE START
tidy_ht <- tidy_ht %>%
mutate(word = str_replace_all(word, "sissy|jupe", replacement = "sissy_jupe"))
# END NEW CODE
ht_count <- tidy_ht %>%
group_by(chapter) %>%
count(word, sort = TRUE) %>%
ungroup %>%
complete(chapter, word,
fill = list(n = 0))
# NEW CODE
sissy_jupe <- ht_count %>%
filter(str_detect(word, "sissy_jupe"))
# END..。产生..。
# A tibble: 38 x 3
chapter word n
<int> <chr> <dbl>
1 0 sissy_jupe 1
2 1 sissy_jupe 0
3 2 sissy_jupe 14
4 3 sissy_jupe 4
5 4 sissy_jupe 1
6 5 sissy_jupe 5
7 6 sissy_jupe 20
8 7 sissy_jupe 7
9 8 sissy_jupe 2
10 9 sissy_jupe 38
# ... with 28 more rows如果我们的任何解决方案对您有帮助(反馈=更好的程序员),请不要忘记为我们的解决方案投票/单击复选标记。
https://stackoverflow.com/questions/56605792
复制相似问题