我有以下数据:
df <- structure(list(a = c(1, 43, 22, 12, 35, 113, 54, 94), b = c("a",
"b", "c", "d", "e", "f", "g", "h")), .Names = c("a", "b"), row.names = c(NA,
-8L), class = c("tbl_df", "tbl", "data.frame"))从这些数据中,我希望选择一定长度的连续子序列。例如,对于序列长度为2的序列,我希望选择行1-2、2-3、3-4等,直到数据帧的最后一行。然后,每个子序列都应加上标签。
子序列长度为2的新df及其序列标签如下所示:
a b seq_label
1 a 1 # First subsequence, row 1-2
43 b 1 #
43 b 2 # Second subsequence, row 2-3
22 c 2 #
22 c 3 # Third subsequence, row 3-4
12 d 3 #
12 d 4
35 e 4
35 e 5
113 f 5
113 f 6
54 g 6
54 g 7
94 h 7类似的子序列长度为3:
a b seq_label
1 a 1 # First subsequence, row 1-3
43 b 1 #
22 c 1 #
43 b 2 # Second subsequence, row 2-4
22 c 2 #
12 d 2 #
22 c 3 # Third subsequence, row 3-5
12 d 3 #
35 e 3 #
12 d 4
35 e 4
113 f 4
35 e 5
113 f 5
54 g 5
113 f 6
54 g 6
94 h 6……
谢谢@drjones的建议回答,我已经提出了解决方案:
map_dfr(1:(nrow(df) - n + 1), function (i) {cbind(df[i:(i + n - 1), ], "seq_label" = i)})发布于 2018-10-08 08:41:23
我们可以使用outer创建索引
n <- 2
i <- 1:(nrow(df) - (n - 1))
cbind(df[t(outer(i, 1:n - 1, `+`)), ],
seq_label = rep(i, each = n))
# a b seq_label
# 1 1 a 1
# 2 43 b 1
# 3 43 b 2
# 4 22 c 2
# 5 22 c 3
# 6 12 d 3
# 7 12 d 4
# 8 35 e 4
# 9 35 e 5
# 10 113 f 5
# 11 113 f 6
# 12 54 g 6
# 13 54 g 7
# 14 94 h 7...or kronecker
cbind(df[kronecker(X = i, Y = 1:n - 1, FUN = `+`), ],
seq_label = rep(i, each = n))...or embed
i <- 1:nrow(df)
cbind(df[as.vector(t(embed(i, n)[ , n:1])), ],
seq_label = rep(head(i, -(n - 1)), each = n))发布于 2018-10-08 08:10:13
不确定您的数据集有多大,但是如果您可以使用一个循环:
get_seq=function(df,n){
res=c()
for(i in 1:(nrow(df)-n+1)){
res=rbind(res,cbind(df[i:(i+n-1),],"seq_label"=i))
}
res
}
get_seq(df,2)
a b seq_label
1 a 1
43 b 1
43 b 2
22 c 2
22 c 3
12 d 3
12 d 4
35 e 4
35 e 5
113 f 5
113 f 6
54 g 6
54 g 7
94 h 7
get_seq(df,3)
a b seq_label
1 a 1
43 b 1
22 c 1
43 b 2
22 c 2
12 d 2
22 c 3
12 d 3
35 e 3
12 d 4
35 e 4
113 f 4
35 e 5
113 f 5
54 g 5
113 f 6
54 g 6
94 h 6发布于 2018-10-08 08:37:31
我们可以使用rollapply从zoo包创建行索引。
library(zoo)
get_sequenced_df <- function(df, n) {
new_df <- df[c(t(rollapply(1:nrow(df), n, c))), ]
transform(new_df, seq_label = rep(seq(nrow(new_df)/n), each = n))
}
get_sequenced_df(df, 2)
# a b seq_label
#1 1 a 1
#2 43 b 1
#3 43 b 2
#4 22 c 2
#5 22 c 3
#6 12 d 3
#7 12 d 4
#8 35 e 4
#9 35 e 5
#10 113 f 5
#11 113 f 6
#12 54 g 6
#13 54 g 7
#14 94 h 7了解如何生成行索引
n <- 2
c(t(rollapply(1:nrow(df), n, c)))
#[1] 1 2 2 3 3 4 4 5 5 6 6 7 7 8
n <- 3
c(t(rollapply(1:nrow(df), n, c)))
#[1] 1 2 3 2 3 4 3 4 5 4 5 6 5 6 7 6 7 8
get_sequenced_df(df, 3)
# a b seq_label
#1 1 a 1
#2 43 b 1
#3 22 c 1
#4 43 b 2
#5 22 c 2
#6 12 d 2
#7 22 c 3
#8 12 d 3
#9 35 e 3
#10 12 d 4
#11 35 e 4
#12 113 f 4
#13 35 e 5
#14 113 f 5
#15 54 g 5
#16 113 f 6
#17 54 g 6
#18 94 h 6https://stackoverflow.com/questions/52697672
复制相似问题