我正试着根据TF-国防军的总和来选择一个词的组。
这是我的数据sof
sof <- data.frame('Text'=c("I have an apple apple and a banana","I have an apple apple and a banana",
"I have an apple apple and a banana", "You drive a car with gloves",
"You drive a car with gloves", "I like your cat dog horse and shoes",
"I like your cat dog horse and shoes","I like your cat dog horse and shoes",
"I like your cat dog horse and shoes", "I have all PC xBox PS Switch games",
"I have all PC xBox PS Switch games","I have all PC xBox PS Switch games",
"I have all PC xBox PS Switch games","I have all PC xBox PS Switch games",
"I have all PC xBox PS Switch games"),
'Word'=c("apple","apple","banana","car","gloves","cat","dog","horse","shoes","PC",
"xBox","PS","Switch","games","all"),
'tfidf'=c(0.127,0.127,0.309,0.203,0.203,0.169,0.341,0.0533,0.331,
0.275,0.143,0.231,0.275,0.143,0.231),
'Thema' = c("AN","AN","V","AU","AU","AR","G","ALG","ALG","WOH",
"AN","AU","WOH","AN","AU"), stringsAsFactors = FALSE)我想做的是:
Text组tfidf的Thema求和sWords,托管Word在Text中找到的所有单词sThema,该变量在步骤2中托管高和的Thema我试过:
sSof <- sof %>% group_by(Text) %>%
summarize(SumTFIDF = sum(unique(tfidf), na.rm = TRUE),
sWords = paste(toString(unique(Word)), collapse = "; "),
sThema = paste(toString(unique(Thema)), collapse = "; "))但是我得到了Thema的所有可能条目,我只需要一个,其中Word的和是最高的。
结果:
> sSof
# A tibble: 4 x 4
Text SumTFIDF sWords sThema
<chr> <dbl> <chr> <chr>
1 I have all PC xBox PS Switch games 0.649 PC, xBox, PS, Switch, games, all WOH, AN, AU
2 I have an apple apple and a banana 0.436 apple, banana AN, V
3 I like your cat dog horse and shoes 0.894 cat, dog, horse, shoes AR, G, ALG
4 You drive a car with gloves 0.203 car, gloves AU 我在找这样的东西:
# A tibble: 4 x 4
Text SumTFIDF sWords sThema
<chr> <dbl> <chr> <chr>
1 I have all PC xBox PS Switch games 0.649 PC, xBox, PS, Switch, games, all WOH
2 I have an apple apple and a banana 0.436 apple, banana AN
3 I like your cat dog horse and shoes 0.894 cat, dog, horse, shoes G
4 You drive a car with gloves 0.203 car, gloves AU只有一个Thema必须留下来,而那个单词的tfidf和值是最高的
有什么想法吗?
发布于 2019-09-05 14:32:38
不确定这是否是最优雅的解决方案,但您可以将其划分为多个步骤并对它们进行join。
sof %>%
group_by(Text, Thema) %>%
summarise(sum_tfidf = sum(unique(tfidf))) %>%
right_join(sof) %>%
left_join(
sof %>%
group_by(Text) %>%
summarise(sWords = str_c(Word, collapse = ", "))
) %>%
slice(which.max(sum_tfidf))
# A tibble: 4 x 6
# Groups: Text [4]
Text Thema sum_tfidf Word tfidf sWords
<chr> <chr> <dbl> <chr> <dbl> <chr>
1 I have all PC xBox PS Switch games WOH 0.275 PC 0.275 PC, xBox, PS, Switch, games, all
2 I have an apple apple and a banana V 0.309 banana 0.309 apple, apple, banana
3 I like your cat dog horse and shoes ALG 0.384 horse 0.0533 cat, dog, horse, shoes
4 You drive a car with gloves AU 0.203 car 0.203 car, gloves https://stackoverflow.com/questions/57806231
复制相似问题