我正在创建一个由1和0组成的矩阵。如果一个单词是字符串的一部分,则为1,否则为0。
例如,预期矩阵如下所示:
white hanging heart holder black suitcase
white hanging heart holder 1 1 1 1 0 0
black suitcase 0 0 0 0 1 1我所掌握的是两个向量:
Itemsvector = c("white hanging heart holder","black suitcase", ...)
Wordsvector = c("white","hanging","heart","holder","black", "suitcase",...)我在玩%操作符中%的用法
strsplit(Itemsvector[1], split = ' ')[[1]] %in% Wordsvector也是
grepl(Wordsvector[1], Itemsvector)这确实给了我真假值,虽然我迷失了把这组值映射到整个矩阵网格。
发布于 2019-05-12 12:10:09
您可以尝试使用double sapply,因为您已经有了Wordsvector来搜索不再需要拆分Itemsvector。我们可以使用Itemsvector找到某个特定的单词是否存在,并且为了额外的预防,我们添加了单词边界,这样它就不匹配"white"和"whites"“。
+(t(sapply(Itemsvector, function(x) sapply(Wordsvector, function(y)
grepl(paste0("\\b",y, "\\b"), x)))))
# white hanging heart holder black suitcase
#white hanging heart holder 1 1 1 1 0 0
#black suitcase 0 0 0 0 1 1数据
Itemsvector = c("white hanging heart holder","black suitcase")
Wordsvector = c("white","hanging","heart","holder","black", "suitcase")发布于 2019-05-12 13:39:47
我们可以更容易地使用table将“Itemsvector”拆分为list of vectors,将其转化为data.frame并使用table
table(stack(setNames(strsplit(Itemsvector, " "), Itemsvector))[2:1])
# values
#ind black hanging heart holder suitcase white
# white hanging heart holder 0 1 1 1 0 1
# black suitcase 1 0 0 0 1 0或使用mtabulate
library(qdapTools)
mtabulate(setNames(strsplit(Itemsvector, " "), Itemsvector))https://stackoverflow.com/questions/56099040
复制相似问题