文章/答案/技术大牛

发布

社区首页 >问答首页 >将所有值解码为零，如果序列<= 3保持一定的信息

问将所有值解码为零，如果序列<= 3保持一定的信息
EN

Stack Overflow用户

提问于 2018-08-27 07:21:22

回答 2查看 135关注 0票数 2

我之前问过一个类似的问题，但我需要更多的输出，并决定发布一个新的问题。

我有一个像这样的data.table对象：

library(data.table)
cells <- c(100, 1,1980,1,0,1,1,0,1,0,
       150, 1,1980,1,1,1,0,0,0,1,
       99 , 1,1980,1,1,1,1,0,0,0,
       899, 1,1980,0,1,0,1,1,1,1,
       789, 1,1982,1,1,1,0,1,1,1 )
colname <- c("number","sex", "birthy", "2004","2005", "2006", "2007", "2008", "2009","2010")
rowname <- c("1","2","3","4","5")
y <- matrix(cells, nrow=5, ncol=10, byrow=TRUE, dimnames =   list(rowname,colname))
y <- data.table(y, keep.rownames = TRUE)

2004年栏中的值1表示该人在2004年期间连续投保。前三年投保的人可以作为研究的一部分。我需要这个data.table的子集，其中包含以下条件为真的所有观察: 2004+2005+2006 =3或2005+2006+2007 =或2006+2007+.

#using melt and rle function to restrucure the data
tmp <- melt(y, id = "rn", measure.vars = patterns("^20"),
        variable.factor = FALSE, variable.name = "year")[, rle(value), by = rn]

#subset data based on condition, keeping only the first relevant sequence
tmp2 <- tmp[(values == 1 & lengths >= 3), .(rn,lengths)][, .SD[1,], by=rn]
##selecting only rows with value=1 and min 3 in a row
##keeping only the variable rn
tmp3 <- tmp[values == 1, which(max(lengths) >= 3), by = rn]$rn

##using the row-number to select obersvations from data.table
##merging length of sequence
dt <- merge(y[as.integer(tmp3)],tmp2, by="rn")

如果它们不是序列的一部分，是否有方法将所有的1到0都转换？例如，rn==4变量"2005“需要为零。

我还需要一个新的变量“悲歌”，它包含了序列开始的年份。例如，rn==5和begy==2004。任何建议都将不胜感激..。

select

data.table

sequence

回答 2

Stack Overflow用户

发布于 2018-08-27 08:27:56

新解决方案：

# define a custom function in order to only keep the sequences
# with 3 (or more) consecutive years
rle3 <- function(x) {
  r <- rle(x)
  r$values[r$lengths < 3 & r$values == 1] <- 0
  inverse.rle(r)
}

# replace all '1'-s that do not belong to a sequence of at least 3 to '0'
# create 'begy'-variable
melt(y, id = 1:4, measure.vars = patterns("^20"),
     variable.factor = FALSE, variable.name = "year"
     )[, value := rle3(value), by = rn
       ][, begy := year[value == 1][1], rn
         ][, dcast(.SD[!is.na(begy)], ... ~ year, value.var = "value")]

这意味着：

rn number sex birthy begy 2004 2005 2006 2007 2008 2009 2010 1: 2 150 1 1980 2004 1 1 1 0 0 0 0 2: 3 99 1 1980 2004 1 1 1 1 0 0 0 3: 4 899 1 1980 2007 0 0 0 1 1 1 1 4: 5 789 1 1982 2004 1 1 1 0 1 1 1

旧解决方案：

# define a custom function in order to only keep the sequences
# with 3 (or more) consecutive years
rle3 <- function(x) {
  r <- rle(x)
  r$values[r$lengths < 3 & r$values == 1] <- 0
  inverse.rle(r)
}

# create a reference 'data.table' with only the row to keep
# and the start year of the (first) sequence (row 5 has 2 sequences of 3)
x <- melt(y, id = "rn", measure.vars = patterns("^20"),
          variable.factor = FALSE, variable.name = "year"
          )[, value := rle3(value), by = rn
            ][value == 1, .SD[1], rn]

# join 'x' with 'y' to add 'begy' and filter out the row with no sequences of 3
y[x, on = "rn", begy := year][!is.na(begy)]

这意味着：

rn number sex birthy 2004 2005 2006 2007 2008 2009 2010 begy 1: 2 150 1 1980 1 1 1 0 0 0 1 2004 2: 3 99 1 1980 1 1 1 1 0 0 0 2004 3: 4 899 1 1980 0 1 0 1 1 1 1 2007 4: 5 789 1 1982 1 1 1 0 1 1 1 2004

票数 3

Stack Overflow用户

发布于 2018-08-30 08:32:09

任择议定书要求

若要将所有1s转换为0s，如果它们不是连续三年或更长时间序列的一部分，
若要添加包含“”序列开始年份的新列，请执行以下操作。

请注意，第二个要求是模棱两可的，因为可能有多个连续3年或更多年的序列，例如在第5行。在这里，我们以第一个(最老的)序列的开始年份为例。

下面的解决方案

从宽到长的整形，
计算连续几年的条纹长度，
如果1s不是连续三年或更长时间序列的一部分，则将它们转换为0s，
获取第一个序列的开始年份，
删除没有连续序列的行(没有找到begy )，以及
最后，重塑回宽格式。

不需要滚动窗口或自定义功能。

library(data.table)
melt(y, , patterns("^\\d"))[
  order(rn), N := .N, by = .(rleid(value), rn)][
    value == 1 & N < 3, value := 0][
      , begy := first(variable[value == 1]), by = rn][
        , dcast(.SD[!is.na(begy), -"N"], ... ~ variable)]

rn number sex birthy begy 2004 2005 2006 2007 2008 2009 2010 1: 2 150 1 1980 2004 1 1 1 0 0 0 0 2: 3 99 1 1980 2004 1 1 1 1 0 0 0 3: 4 899 1 1980 2007 0 0 0 1 1 1 1 4: 5 789 1 1982 2004 1 1 1 0 1 1 1

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/52034420

复制

相似问题

问将所有值解码为零，如果序列<= 3保持一定的信息
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将所有值解码为零，如果序列<= 3保持一定的信息EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问将所有值解码为零，如果序列<= 3保持一定的信息
EN