首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何根据群和和有条件地变异?

如何根据群和和有条件地变异?
EN

Stack Overflow用户
提问于 2019-09-05 13:08:23
回答 1查看 50关注 0票数 0

我正试着根据TF-国防军的总和来选择一个词的组。

这是我的数据sof

代码语言:javascript
复制
sof <- data.frame('Text'=c("I have an apple apple and a banana","I have an apple apple and a banana",
                       "I have an apple apple and a banana", "You drive a car with gloves",
                       "You drive a car with gloves", "I like your cat dog horse and shoes",
                       "I like your cat dog horse and shoes","I like your cat dog horse and shoes",
                       "I like your cat dog horse and shoes", "I have all PC xBox PS Switch games",
                       "I have all PC xBox PS Switch games","I have all PC xBox PS Switch games",
                       "I have all PC xBox PS Switch games","I have all PC xBox PS Switch games",
                       "I have all PC xBox PS Switch games"),
                  'Word'=c("apple","apple","banana","car","gloves","cat","dog","horse","shoes","PC",
                         "xBox","PS","Switch","games","all"), 
                  'tfidf'=c(0.127,0.127,0.309,0.203,0.203,0.169,0.341,0.0533,0.331,
                            0.275,0.143,0.231,0.275,0.143,0.231),
                  'Thema' = c("AN","AN","V","AU","AU","AR","G","ALG","ALG","WOH",
                              "AN","AU","WOH","AN","AU"), stringsAsFactors = FALSE)

我想做的是:

  • Text
  • 根据tfidfThema求和
  • 添加一个新变量sWords,托管WordText中找到的所有单词
  • 添加一个新变量sThema,该变量在步骤2中托管高和的Thema

我试过:

代码语言:javascript
复制
sSof <- sof %>% group_by(Text) %>% 
    summarize(SumTFIDF = sum(unique(tfidf), na.rm = TRUE),
              sWords = paste(toString(unique(Word)), collapse = "; "),
              sThema = paste(toString(unique(Thema)), collapse = "; "))

但是我得到了Thema的所有可能条目,我只需要一个,其中Word的和是最高的。

结果:

代码语言:javascript
复制
> sSof
# A tibble: 4 x 4
  Text                                SumTFIDF sWords                           sThema     
  <chr>                                  <dbl> <chr>                            <chr>      
1 I have all PC xBox PS Switch games     0.649 PC, xBox, PS, Switch, games, all WOH, AN, AU
2 I have an apple apple and a banana     0.436 apple, banana                    AN, V      
3 I like your cat dog horse and shoes    0.894 cat, dog, horse, shoes           AR, G, ALG 
4 You drive a car with gloves            0.203 car, gloves                      AU    

我在找这样的东西:

代码语言:javascript
复制
# A tibble: 4 x 4
  Text                                SumTFIDF sWords                           sThema     
  <chr>                                  <dbl> <chr>                            <chr>      
1 I have all PC xBox PS Switch games     0.649 PC, xBox, PS, Switch, games, all WOH
2 I have an apple apple and a banana     0.436 apple, banana                    AN      
3 I like your cat dog horse and shoes    0.894 cat, dog, horse, shoes           G 
4 You drive a car with gloves            0.203 car, gloves                      AU

只有一个Thema必须留下来,而那个单词的tfidf和值是最高的

有什么想法吗?

EN

回答 1

Stack Overflow用户

发布于 2019-09-05 14:32:38

不确定这是否是最优雅的解决方案,但您可以将其划分为多个步骤并对它们进行join

代码语言:javascript
复制
sof %>%
  group_by(Text, Thema) %>%
  summarise(sum_tfidf = sum(unique(tfidf))) %>%
  right_join(sof) %>%
  left_join(
    sof %>%
      group_by(Text) %>%
      summarise(sWords = str_c(Word, collapse = ", "))
  ) %>%
  slice(which.max(sum_tfidf))


# A tibble: 4 x 6
# Groups:   Text [4]
  Text                                Thema sum_tfidf Word    tfidf sWords                          
  <chr>                               <chr>     <dbl> <chr>   <dbl> <chr>                           
1 I have all PC xBox PS Switch games  WOH       0.275 PC     0.275  PC, xBox, PS, Switch, games, all
2 I have an apple apple and a banana  V         0.309 banana 0.309  apple, apple, banana            
3 I like your cat dog horse and shoes ALG       0.384 horse  0.0533 cat, dog, horse, shoes          
4 You drive a car with gloves         AU        0.203 car    0.203  car, gloves 
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/57806231

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档