我试图在R中reshape()一些时变数据,我正在处理以下数据集:
dframe <- structure(list(participant_id = structure(c(48L, 43L, 51L, 28L, 35L, 65L), .Label = c("PRA", "RA", "ASD", "LAD", "ASDGZV ", "RAGSD", "GREA", "SDFDSA", "DSFG", "FHJ", "RQGA", "AESFD", "RGAV", "FGHDF", "HSGD", "FDGH", "ASDF", "AGSD", "SADF", "SADF", "SF", "XV", "ASDCV", "ASDF", "ASDG", "SDF", "XCVZ", "ZXCV", "ASGV", "SAFDV", "ASDF", "SDFV", "SAFD", "SAFD", "AGS", "FDSGVX", "WAFDS", "DSAZC", "SADCZX", "SADFCX", "DSAFC", "FDSGV", "ADSCXZ", "SDFACZ", "SADFCZ", "AFSDZX", "EAWFDSZ", "FDVCZX", "SADZC", "FSADCZ", "AESFDZC", "WAFDSZC", "SDFC", "FSADC", "DSZXC", "SDAFC", "AFSDZC", "WFADS", "FSDVC", "GSDHBXC", "EFWADSCXZ", "EWAFDSC", "AFDSCZ", "AWEFDC", "AGSFV"), class = "factor"), baseline_pupilsize = c(6, 6, 7, 6, 6, 6), baseline_coe = c(11.19, 13.6, 3.96, 7.64, 6.12, 6.92), baseline_rcb = c(16.74, 25, 25, 18.37, 25, 25), final_pop = c(NA, NA, 7.1, 8, 6, NA), final_coe = c(NA, NA, 5.9263624, 4.89, 11.98, NA), final_rcb = c(NA, NA, 25L, NA, NA, NA)), .Names = c("participant_id", "baseline_pop", "baseline_coe", "baseline_rcb", "final_pop", "final_coe", "final_rcb"), row.names = c(NA, 6L), class = "data.frame")这些是纵向研究中的时变数据,也是我从源文件中导入的更大数据集的子集。我想提取pop、coe和rcb值,用于baseline和final研究访问(在我完整的数据集中,有几次访问,为了这个问题,我省略了这些访问)。
我可以做到以下几点:
reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = 2:length(dframe),direction='long')但是,这最终会导致pop中的值被标记为coe。reshape2的文档告诉我,我应该显式引用varying值,以避免“猜测”。所以,我试一试:
reshape(dframe,idvar='participant_id',v.names = c('pop','coe','rcb'),varying = c('baseline_pop','baseline_coe','baseline_rcb','final_pop','final_coe','final_rcb'),direction='long')这将导致完全相同的输出,尽管显式地命名了varying参数。我做错了什么?据推测,由于字母化,pop最终得到了coe的值,但我不明白为什么会这样,因为我现在已经明确声明了varying参数.
编辑:预期的输出如下:
participant_id time pop coe rcb
FDVCZX 1 6 11.19 16.74
ADSCXZ 1 6 13.6 25
AESFDZC 1 7 3.96 25
ZXCV 1 6 7.64 18.37
AGS 1 6 6.12 25
AGSFV 1 6 6.92 25
FDVCZX 2 NA NA NA
ADSCXZ 2 NA NA NA
AESFDZC 2 7.1 5.926362 25
ZXCV 2 8 4.89 NA
AGS 2 6 11.98 NA
AGSFV 2 NA NA NA但是,正如您将看到的,pop值最终出现在coe列中,反之亦然。
发布于 2016-02-01 15:51:29
我们可以使用来自data.table的data.table,它可以使用多个measure列。
library(data.table)
melt(setDT(dframe), measure=patterns('pop', 'coe', 'rcb'),
value.name = c('pop', 'coe', 'rcb'), variable.name='time')
# participant_id time pop coe rcb
# 1: FDVCZX 1 6.0 11.190000 16.74
# 2: ADSCXZ 1 6.0 13.600000 25.00
# 3: AESFDZC 1 7.0 3.960000 25.00
# 4: ZXCV 1 6.0 7.640000 18.37
# 5: AGS 1 6.0 6.120000 25.00
# 6: AGSFV 1 6.0 6.920000 25.00
# 7: FDVCZX 2 NA NA NA
# 8: ADSCXZ 2 NA NA NA
# 9: AESFDZC 2 7.1 5.926362 25.00
#10: ZXCV 2 8.0 4.890000 NA
#11: AGS 2 6.0 11.980000 NA
#12: AGSFV 2 NA NA NAhttps://stackoverflow.com/questions/35134360
复制相似问题