我有一个这样的数据集
alpha number fr color
1 a 20 0.8 rot
2 a 21 2.0 rot
3 a 2 0.8 rot
4 a 34 0.8 rot
5 f 42 0.5 grün .......
......................现在,我想要将此数据集拆分为更多的观测值,这取决于number<20之类的条件,这样新数据集看起来就像
alpha number fr color
1 a 19 0.8 rot
2 a 1 0.8 rot
3 a 10 2.0 rot
4 a 11 2.0 rot
5 a 2 0.8 rot
6 a 19 0.8 rot
7 a 15 0.8 rot
8 f 7 0.5 grün
9 f 7 0.5 grün
10 f 7 0.5 grün
11 f 7 0.5 grün
12 f 7 0.5 grün
13 f 7 0.5 grün
.......或者类似地,只要条件不为真,就重复观察。
如何拆分并不重要,但对于您为其他变量拆分的数据,观察值必须是相同的?
发布于 2019-09-24 03:11:35
df1 <- structure(list(alpha = c("a", "a", "a", "a", "f"),
number = c(20L, 21L, 2L, 34L, 42L),
fr = c(0.8, 2, 0.8, 0.8, 0.5),
color = c("rot", "rot", "rot", "rot", "grun")),
row.names = c(NA, -5L), class = "data.frame")rep.rev <- function(x,t){
if(t != 0){
rep(x,t)
} else {
NA_integer_
}
}library(dplyr)
library(tidyr)set.seed(22)df1 %>%
mutate(divisor = floor(runif(n(), min = 2, max = 19)),
quotient = number%/%divisor,
remainder = ifelse(number%%divisor==0, NA, number%%divisor)) %>%
rowwise %>%
mutate(number = list(c(rep.rev(divisor, quotient),remainder))) %>%
unnest %>%
select(alpha, number, fr, color) %>%
filter(!is.na(number))#> # A tibble: 14 x 4
#> alpha number fr color
#> <chr> <dbl> <dbl> <chr>
#> 1 a 7 0.8 rot
#> 2 a 7 0.8 rot
#> 3 a 6 0.8 rot
#> 4 a 10 2 rot
#> 5 a 10 2 rot
#> 6 a 1 2 rot
#> 7 a 2 0.8 rot
#> 8 a 10 0.8 rot
#> 9 a 10 0.8 rot
#> 10 a 10 0.8 rot
#> 11 a 4 0.8 rot
#> 12 f 16 0.5 grun
#> 13 f 16 0.5 grun
#> 14 f 10 0.5 grun发布于 2019-09-24 05:47:52
我们有很多方法可以将一个数字一分为二,但下面的方法将每个数字一分为二,即偶数(m=2n)分为n和n,奇数(m=2n+1)分为(n+1)和n。
> library(dplyr)
> df <- data.frame(alpha=c("a","a","a","a","f"),
+ number=c(20,21,2,34,42),
+ fr=c(0.8,2.0,0.8,0.8,0.5),
+ color=c("rot","rot","rot","rot","grün"))函数doSplit()以数据帧df和整数threshold作为参数。
> doSplit <- function(df, threshold){
+ # splits rows where number >= threshold until all rows have number < threshold
+
+ colNames <- colnames(df)
+ df <- df %>% mutate(orig_id=rownames(df))
+ dfBelow <- df %>% filter(number<threshold)
+ dfAbove1 <- df %>% filter(number>=threshold) %>% mutate(number=(number%/%2)+(number%%2))
+ dfAbove2 <- df %>% filter(number>=threshold) %>% mutate(number=number%/%2)
+ combData <- rbind(dfBelow, dfAbove1, dfAbove2)
+ combData <- combData %>% arrange(orig_id) %>% select(colNames)
+ return(combData)
+ }这里,我们将阈值定义为20。只要存在编号为>=20的行,while循环就会重复调用doSplit()函数。
> myThreshold <- 20
> splitDf <- df
> while(splitDf %>% pull(number) %>% max() >= myThreshold){
+ splitDf <- doSplit(splitDf, myThreshold)
+ }下面是拆分后的数据帧:
> splitDf
alpha number fr color
1 a 10 0.8 rot
2 a 10 0.8 rot
3 a 11 2.0 rot
4 a 10 2.0 rot
5 a 2 0.8 rot
6 a 17 0.8 rot
7 a 17 0.8 rot
8 f 11 0.5 grün
9 f 10 0.5 grün
10 f 11 0.5 grün
11 f 10 0.5 grünhttps://stackoverflow.com/questions/58067170
复制相似问题