原始数据:
df <- structure(list(ID_client = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1_", "2_", "3_", "4_"), class = "factor"), Connected = c(1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L), Year = c(2010L, 2010L, 2010L, 2010L, 2015L, 2015L, 2015L, 2015L)), class = "data.frame", row.names = c(NA, -8L))原始数据:
`ID_client Connected Year
1_ 1 2010
2_ 1 2010
3_ 1 2010
4_ 0 2010
1_ 1 2015
2_ 0 2015
3_ 1 2015
4_ 0 2015`我的目的是创建以下数据:
`Year ID_client 1_ 2_ 3_ 4_
2010 1_ 0 1 1 0
2010 2_ 1 0 1 0
2010 3_ 1 1 0 0
2010 4_ 0 0 0 0
2015 1_ 0 0 1 0
2015 2_ 0 0 0 0
2015 3_ 1 0 0 0
2015 4_ 0 0 0 0`换句话说,在2010年的客户端1_、2_和3_中表示这一点的矩阵是连接的,而另一个则不是。重要的是,我不认为某人与她自己有联系。
我已经尝试了以下代码:
df %>%
group_by(Year, Connected) %>%
mutate(temp = rev(ID_client)) %>%
pivot_wider(names_from = ID_client,
values_from = Connected,
values_fill = list(Connected = 0)) %>%
arrange(Year, temp)这段代码不会重现我所需要的东西。相反,这是结果:
`Year ID_client 1_ 2_ 3_ 4_
2010 1_ 0 0 1 0
2010 2_ 0 1 0 0
2010 3_ 1 0 0 0
2010 4_ 0 0 0 0
2015 1_ 0 0 1 0
2015 2_ 0 0 0 0
2015 3_ 1 0 0 0
2015 4_ 0 0 0 0`发布于 2019-09-30 09:12:08
我们可以group_by Year并创建一个包含ID_client值的新列,除了当前值之外,每个组中都有Connected == 1。我们complete缺失的级别,然后将数据转换为宽格式。
library(tidyverse)
df %>%
group_by(Year) %>%
mutate(temp = map(ID_client, ~setdiff(ID_client[Connected == 1], .x))) %>%
unnest(cols = temp) %>%
complete(temp = unique(ID_client), fill = list(Connected = 0)) %>%
mutate(ID_client = coalesce(as.character(ID_client), temp)) %>%
pivot_wider(names_from = temp,
values_from = Connected,
values_fill = list(Connected = 0)) %>%
arrange(Year, ID_client)
# Year ID_client `1_` `2_` `3_` `4_`
# <int> <chr> <dbl> <dbl> <dbl> <dbl>
#1 2010 1_ 0 1 1 0
#2 2010 2_ 1 0 1 0
#3 2010 3_ 1 1 0 0
#4 2010 4_ 0 0 0 0
#5 2015 1_ 0 0 1 0
#6 2015 2_ 0 0 0 0
#7 2015 3_ 1 0 0 0
#8 2015 4_ 0 0 0 0发布于 2019-09-30 10:46:34
您可以使用自连接,即数据与自身的内部连接。通过标记客户端组合的信息片段连接:这将是Year和Connected中的值。因为您想要的输出在其对角线上有零,所以过滤掉两个ID相同的情况。
正如您所看到的,我还没有过渡到tidyr的pivot_wider版本,但这应该是可适应的。在spread中,指定不应删除未使用的因子级别,这样您就不会丢失ID 4。
library(dplyr)
library(tidyr)
inner_join(df, df, by = c("Year", "Connected")) %>%
filter(Connected == 1, ID_client.x != ID_client.y) %>%
spread(key = ID_client.y, value = Connected, fill = 0, drop = F) %>%
arrange(Year)
#> ID_client.x Year 1_ 2_ 3_ 4_
#> 1 1_ 2010 0 1 1 0
#> 2 2_ 2010 1 0 1 0
#> 3 3_ 2010 1 1 0 0
#> 4 4_ 2010 0 0 0 0
#> 5 1_ 2015 0 0 1 0
#> 6 2_ 2015 0 0 0 0
#> 7 3_ 2015 1 0 0 0
#> 8 4_ 2015 0 0 0 0https://stackoverflow.com/questions/58160385
复制相似问题