文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用TidyText将多行合并为一行

问如何使用TidyText将多行合并为一行
EN

Stack Overflow用户

提问于 2019-06-15 06:27:43

回答 2查看 232关注 0票数 1

我正在看一本小说，我想在整本书中寻找人物名字的出现，一些人物有不同的名字。例如，字符"Sissy Jupe“由"Sissy”和"Jupe“组成。我想要将两行单词计数合并为一行，这样我就可以看到"Sissy Jupe“的计数。

我已经研究过使用sum、rbind、merge和其他使用留言板的方法，但似乎都不起作用。有很多很好的例子，但它们都不起作用。

library(tidyverse) 
library(gutenbergr)
library(tidytext)

ht <- gutenberg_download(786)

ht_chap <- ht %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
                                                 ignore_case = TRUE))))

tidy_ht <- ht_chap %>%
  unnest_tokens(word, text) %>%
  mutate(word = str_extract(word, "[a-z']+")) # preserves online letters; removes _)

ht_count <- tidy_ht %>%
  group_by(chapter) %>%
  count(word, sort = TRUE) %>%
  ungroup %>%
  complete(chapter, word,
           fill = list(n = 0)) 

gradgrind <- filter(ht_count, word == "gradgrind")
bounderby <- filter (ht_count, word == "bounderby")
sissy <- filter (ht_count, word == "sissy")

## TEST
sissy_jupe <- ht_count %>% 
  filter(word %in% c("sissy", "jupe"))

我想要一个名为"sissy_jupe“的单个"word”项，它与章节中的n相符。这很接近，但不是它。

# A tibble: 76 x 3
   chapter word      n
     <int> <chr> <dbl>
 1       0 jupe      0
 2       0 sissy     1
 3       1 jupe      0
 4       1 sissy     0
 5       2 jupe      5
 6       2 sissy     9
 7       3 jupe      3
 8       3 sissy     1
 9       4 jupe      1
10       4 sissy     0
# … with 66 more rows

dplyr

tidytext

回答 2

Stack Overflow用户

发布于 2019-06-15 06:41:15

下面的代码应该会得到所需的输出。

library(tidyverse)
df %>% group_by(chapter) %>% 
  mutate(n = sum(n),
         word = paste(word, collapse="_")) %>% 
  distinct(chapter, .keep_all = T)

票数 1

Stack Overflow用户

发布于 2019-06-15 08:16:23

欢迎来到stackoverflow Tom。这是一个想法：

基本上，(1)在整齐的tibble中找到"sissy“或"jupe”并替换为"sissy_jupe"，(2)像您一样创建ht_count，(3)打印结果：

library(tidyverse) 
library(gutenbergr)
library(tidytext)

ht <- gutenberg_download(786)

ht_chap <- ht %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, regex("^chapter [\\divxlc]",
                                                 ignore_case = TRUE))))

tidy_ht <- ht_chap %>%
  unnest_tokens(word, text) %>%
  mutate(word = str_extract(word, "[a-z']+")) # preserves online letters; removes _)

# NEW CODE START
tidy_ht <- tidy_ht %>%
  mutate(word = str_replace_all(word, "sissy|jupe", replacement = "sissy_jupe"))
# END NEW CODE

ht_count <- tidy_ht %>%
  group_by(chapter) %>%
  count(word, sort = TRUE) %>%
  ungroup %>%
  complete(chapter, word,
           fill = list(n = 0))

# NEW CODE
sissy_jupe <- ht_count %>% 
  filter(str_detect(word, "sissy_jupe"))
# END

..。产生..。

# A tibble: 38 x 3
   chapter word           n
     <int> <chr>      <dbl>
 1       0 sissy_jupe     1
 2       1 sissy_jupe     0
 3       2 sissy_jupe    14
 4       3 sissy_jupe     4
 5       4 sissy_jupe     1
 6       5 sissy_jupe     5
 7       6 sissy_jupe    20
 8       7 sissy_jupe     7
 9       8 sissy_jupe     2
10       9 sissy_jupe    38
# ... with 28 more rows

如果我们的任何解决方案对您有帮助(反馈=更好的程序员)，请不要忘记为我们的解决方案投票/单击复选标记。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/56605792

复制

相似问题

问如何使用TidyText将多行合并为一行
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用TidyText将多行合并为一行EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用TidyText将多行合并为一行
EN