我正在尝试运行键值分析,一切正常,然后,由于未知的原因,它开始给我一个错误。我使用的是data_corpus_inaugural,它是美国总统就职演说的quanteda-package语料库对象。
我的代码:
> corpus_pres <- corpus_subset(data_corpus_inaugural,
+ President %in% c("Obama", "Trump"))
> dtm_pres <- dfm(corpus_pres, groups = "President",
+ remove = stopwords("english"), remove_punct = TRUE)
Error: groups must have length ndoc(x)
In addition: Warning messages:
1: 'dfm.corpus()' is deprecated. Use 'tokens()' first.
2: '...' should not be used for tokens() arguments; use 'tokens()' first.
3: 'groups' is deprecated; use dfm_group() instead
> 发布于 2021-06-23 22:33:46
在quanteda v3中,"dfm()从标记对象构造文档特征矩阵“- https://tutorials.quanteda.io/basic-operations/dfm/dfm/
试试这个:
toks_pres <- tokens(pres_corpus, remove_punct = TRUE) %>%
tokens_remove(pattern = stopwords("en")) %>%
tokens_group(groups = President)
pres_dfm <- dfm(toks_pres)发布于 2021-10-01 21:28:19
我在分析tweeter帐户时遇到了同样的问题,这段代码适用于我。您可以跨帐户搜索术语
# to make a group in corpus
twcorpus <- corpus(users) %>%
corpus_group(groups= interaction(user_username))
# to visualize textplot_xray
textplot_xray(kwic(twcorpus, "helsin*"), scale="relative")https://stackoverflow.com/questions/67267702
复制相似问题