首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >R中文本挖掘/情感(极性)分析中的多重qdap转换

R中文本挖掘/情感(极性)分析中的多重qdap转换
EN

Stack Overflow用户
提问于 2015-12-01 14:49:23
回答 1查看 265关注 0票数 3

我有一个data.frame,它有周号、week和文本评论( text )。我希望将week变量作为分组变量,并在其上运行一些基本文本分析(例如,qdap::polarity)。一些评论文本有多个句子,然而,我只关心一周的极性“整体”。

在运行qdap::polarity并遵守其警告消息之前,我如何将多个文本转换链接在一起?我能够将转换与tm::tm_maptm::tm_reduce联系在一起--在qdap中有类似的东西吗?在运行qdap::polarity和/或qdap::sentSplit之前,对文本进行预处理/转换的正确方法是什么?

下面的代码/可复制示例中有更多详细信息:

代码语言:javascript
复制
library(qdap)
library(tm)

df <- data.frame(week = c(1, 1, 1, 2, 2, 3, 4),
                 text = c("This is some text. It was bad. Not good.",
                          "Another review that was bad!",
                          "Great job, very helpful; more stuff here, but can't quite get it.",
                          "Short, poor, not good Dr. Jay, but just so-so. And some more text here.",
                          "Awesome job! This was a great review. Very helpful and thorough.",
                          "Not so great.",
                          "The 1st time Mr. Smith helped me was not good."),
                 stringsAsFactors = FALSE)

docs <- as.Corpus(df$text, df$week)

funs <- list(stripWhitespace,
             tolower,
             replace_ordinal,
             replace_number,
             replace_abbreviation)

# Is there a qdap function that does something similar to the next line?
# Or is there a way to pass this VCorpus / Corpus directly to qdap::polarity?
docs <- tm_map(docs, FUN = tm_reduce, tmFuns = funs)


# At the end of the day, I would like to get this type of output, but adhere to
# the warning message about running sentSplit. How should I pre-treat / cleanse
# these sentences, but keep the "week" grouping?
pol <- polarity(df$text, df$week)

## Not run:
# check_text(df$text)
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2015-12-02 03:28:58

您可以按照警告中的建议运行sentSplit,如下所示:

代码语言:javascript
复制
df_split <- sentSplit(df, "text")
with(df_split, polarity(text, week))

##   week total.sentences total.words ave.polarity sd.polarity stan.mean.polarity
## 1    1               5          26       -0.138       0.710             -0.195
## 2    2               6          26        0.342       0.402              0.852
## 3    3               1           3       -0.577          NA                 NA
## 4    4               2          10        0.000       0.000                NaN

请注意,我在github上提供了一个分离情绪包sentimentr,这是对qdap版本的速度、功能和文档的改进。这在sentiment_by函数中执行内部语句分裂。下面的脚本允许您安装并使用该包:

代码语言:javascript
复制
if (!require("pacman")) install.packages("pacman")
p_load_gh("trinker/sentimentr")

with(df, sentiment_by(text, week))

##    week word_count        sd ave_sentiment
## 1:    2         25 0.7562542    0.21086408
## 2:    1         26 1.1291541    0.05781106
## 3:    4         10        NA    0.00000000
## 4:    3          3        NA   -0.57735027
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/34023200

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档