我想在数据帧中找到所有公共元素的数量。
name members
x1 A,B,N,K,Y,G
x2 J,L,M,N,T
x3 G,H,S,J,D,F
x4 J,K,H,F,H,D,L
name common name
x1 6 x1
x1 2 x2
x1 - x3
x1 - x4
x2 - x1
x2 5 - x2
x2 - x3
x2 - x4
x3 - x1
x3 - x2
x3 6 - x3
x3 - x4
x4 - x1
x4 - x2
x4 - x3
x4 7 -x4发布于 2019-08-26 14:02:29
我相信下面的代码可以完成问题所要求的内容。但是请注意,我发现它很复杂,有两个merge指令,也许其他人会找到一个更简单的解决方案。
fun <- function(DF){
ex <- expand.grid(Var2 = DF[['name']], name = DF[['name']])[2:1]
members <- as.character(DF[['members']])
merge(DF, ex)
}
tmp <- merge(df1, fun(df1))
o <- order(tmp[[3]])
tmp$members2 <- tmp$members[o]
tmp$common <- apply(tmp[c(2, 4)], 1, function(x){
y1 <- unlist(strsplit(as.character(x[1]), ","))
y2 <- unlist(strsplit(as.character(x[2]), ","))
length(intersect(y1, y2))
})
res <- tmp[c(1, 5, 3)]
names(res)[3] <- "name2"
head(res)
# name common name2
#1 x1 6 x1
#2 x1 1 x2
#3 x1 1 x3
#4 x1 1 x4
#5 x2 1 x1
#6 x2 5 x2最后清理一下。
rm(tmp)数据.
df1 <- read.table(text = "
name members
x1 A,B,N,K,Y,G
x2 J,L,M,N,T
x3 G,H,S,J,D,F
x4 J,K,H,F,H,D,L
", header = TRUE)发布于 2019-08-26 14:01:21
1) dplyr/tidyr对于每一行,使用separate_rows为每个成员创建一个单独的行,并通过members将其连接到自己。然后计算计数并完成它。
library(dplyr)
library(tidyr)
DF %>%
separate_rows(members) %>%
distinct %>%
inner_join(., ., by = "members") %>%
count(name.x, name.y) %>%
complete(name.x, name.y)给予:
# A tibble: 16 x 3
name.x name.y n
<chr> <chr> <int>
1 x1 x1 6
2 x1 x2 1
3 x1 x3 1
4 x1 x4 1
5 x2 x1 1
6 x2 x2 5
7 x2 x3 1
8 x2 x4 2
9 x3 x1 1
10 x3 x2 1
11 x3 x3 6
12 x3 x4 4
13 x4 x1 1
14 x4 x2 2
15 x4 x3 4
16 x4 x4 62) Base 创建一个函数,用于计算两个成员组件之间的交叉点数。然后使用outer将其应用于每一对,并转换为data.frame。
Scan <- function(x) scan(text = x, what = "", sep = ",", quiet = TRUE)
countSame <- function(x, y) length(intersect(Scan(x), Scan(y)))
x <- setNames(DF$members, DF$name)
as.data.frame.table(outer(x, x, Vectorize(countSame)))给予:
Var1 Var2 Freq
1 x1 x1 6
2 x2 x1 1
3 x3 x1 1
4 x4 x1 1
5 x1 x2 1
6 x2 x2 5
7 x3 x2 1
8 x4 x2 2
9 x1 x3 1
10 x2 x3 1
11 x3 x3 6
12 x4 x3 4
13 x1 x4 1
14 x2 x4 2
15 x3 x4 4
16 x4 x4 6尽管上面询问的是data.frame表单,但您可能更喜欢2d表,它可以通过从最后一行代码中省略as.data.frame.table来生成。
x1 x2 x3 x4
x1 6 1 1 1
x2 1 5 1 2
x3 1 1 6 4
x4 1 2 4 6(2)只有两条线长的变异,可以通过对成员进行串分裂,然后利用外部计算成对相交的长度来形成变化。最后,我们将转换成一个数据帧。可以通过省略as.data.frame.table再次形成2d表。)
x <- with(DF, setNames(strsplit(members, ","), name))
as.data.frame.table(outer(x, x, Vectorize(function(x, y) length(intersect(x, y)))))给予:
Var1 Var2 Freq
1 x1 x1 6
2 x2 x1 1
3 x3 x1 1
4 x4 x1 1
5 x1 x2 1
6 x2 x2 5
7 x3 x2 1
8 x4 x2 2
9 x1 x3 1
10 x2 x3 1
11 x3 x3 6
12 x4 x3 4
13 x1 x4 1
14 x2 x4 2
15 x3 x4 4
16 x4 x4 6备注
Lines <- "name members
x1 A,B,N,K,Y,G
x2 J,L,M,N,T
x3 G,H,S,J,D,F
x4 J,K,H,F,H,D,L"
DF <- read.table(text = Lines, header = TRUE, as.is = TRUE)https://stackoverflow.com/questions/57658909
复制相似问题