我有四个基本的数据: ID,时间,曝光,结果
我希望在我的曝光和结果之间有一个散点,但是我的曝光的兴趣时间点不同于对结果感兴趣的时间点,因此有一些I在该结果时间点没有任何评估。我想要做的是创建一个数据子集,每个ID作为一行,然后暴露在时间-1和结果在时间-3,但如果一个ID不存在的t have an assessment at time-3 I have it included with the value NA. The issue is that in the data if a timepoint was not assessed, the relative row for that ID doesn在第一位。以下是数据的一个示例:
ID <- c(1,1,2,2,2,3,3,3,4,4)
exposure <-c(1.2, 1.3, 1.4, 1.5, 2.1, 2.2, 3.2, 4.2, 5.2, 6.2)
outcome <-c(0.8, 0.7, 0.6, 0.5, 0.4, 0.3, 0.2, 0.1, 2.1, 3.1)
Time<-c("time_1","time_2","time_1","time_2","time_3","time_1","time_2","time_3","time_1","tme_2")
data <-data.frame(ID,exposure,outcome,Time)为什么我要这样做,因为散点图是一个横截面图,因此如果我只是根据每个ID的时间绘制,该图将是空的,因为在time_1和time_3的结果中,每行都不会有一对曝光,所以这就是为什么我需要创建一个数据子集,并自己制作paires。
到目前为止,我尝试过这些密码:
# so you see the empty cells and the reason of getting an empty plot
df <- data |> pivot_wider (name_from = Time, values_from = c(exposure,outcome))
#subsetting the data to only my desired time points (this helps me to see in my actual # data which IDs are actually not having an assessed time point
df1 <- data %>%
group_by(ID)%>%
filter(data, Time=="time_1" | Time=="time_3")%>%
ungroup()
# And eventually subsetting the data based on different timepoint to then merge them #together
df2 <- filter (data, Time=="time_1")
df3 <- filter (data, Time=="time_3")但是在最后一段代码中,您可以看到这两个数据集的大小是不同的,除此之外,我在临床上很重要地表明,例如对于ID=1,结果在time_3上有NA,所以我不想只使用两个可用值的ID子集。
因此,我希望最终拥有的数据集需要有以下结构:
ID exposure_time_1 outcome_time_3
----------------------------------
1 1.2 NA
2 1.4 0.4
3 2.2 0.1
4 5.2 NA对此有什么解决办法吗?
发布于 2022-11-30 05:56:37
你差点就有了。只需select列在您的pivot_wider后面。
df %>%
select(ID, exposure_time_1, outcome_time_3) %>%
filter(!is.na(exposure_time_1) | !is.na(outcome_time_3))您的数据集在这里不需要,但是我添加了filter以确保最后两列中至少有一列是非空的。不过,也许你真的想要filter(!is.na(outcome_time_3))。
https://stackoverflow.com/questions/74620329
复制相似问题