我如何准备“事务”形式的数据,但对于每个事务ID,时间影响/顺序都被考虑在内?我发现使用"split“函数,它们将按字母顺序排序。
例如:
ID Items Sequence
1 D 1
1 A 2
1 C 3
2 A 1
2 B 2事务中的期望输出:
ID Items
1 D A C #notice that A comes after D as it is dictacted by sequence variable
# here for the order
2 A B致以问候。
发布于 2016-04-27 15:52:56
使用lapply和rbind,
DF = read.table(text="ID Items Sequence
1 D 1
1 A 2
1 C 3
2 A 1
2 B 2",header=TRUE,stringsAsFactors=FALSE,na.strings="")
DF
# ID Items Sequence
#1 1 D 1
#2 1 A 2
#3 1 C 3
#4 2 A 1
#5 2 B 2对于每个ID,将dataframe子集,按顺序排序,组合项目,并返回每个ID的输出
DF_new = do.call(rbind,lapply(unique(DF$ID),function(x) {
subset_DF = DF[DF$ID==x,];
subset_DF = subset_DF[,order(subset_DF$Sequence)]
subset_DF = subset_DF[,c("ID","Items")]
subset_DF$Items = paste0(subset_DF$Items,collapse=" ")
subset_DF = unique(subset_DF)
rownames(subset_DF)= NULL
return(subset_DF)
}))
DF_new
# ID Items
#1 1 D A C
#2 2 A Bhttps://stackoverflow.com/questions/36882018
复制相似问题