文章/答案/技术大牛

发布

问在R中拆分数据帧
EN

Stack Overflow用户

提问于 2019-09-24 01:09:35

回答 2查看 157关注 0票数 1

我有一个这样的数据集

alpha number  fr color
1     a   20 0.8   rot
2     a   21 2.0   rot
3     a    2 0.8   rot
4     a   34 0.8   rot
5     f   42 0.5  grün .......
......................

现在，我想要将此数据集拆分为更多的观测值，这取决于number<20之类的条件，这样新数据集看起来就像

alpha number  fr color
1     a   19 0.8   rot
2     a   1  0.8   rot
3     a   10 2.0   rot
4     a   11 2.0   rot
5     a    2 0.8   rot
6     a   19 0.8   rot
7     a   15 0.8   rot
8     f   7  0.5  grün 
9     f   7  0.5  grün 
10     f   7  0.5  grün 
11    f   7  0.5  grün 
12     f   7  0.5  grün 
13    f   7  0.5  grün 
 .......

或者类似地，只要条件不为真，就重复观察。

如何拆分并不重要，但对于您为其他变量拆分的数据，观察值必须是相同的？

dataframe

split

回答 2

Stack Overflow用户

发布于 2019-09-24 03:11:35

df1 <- structure(list(alpha = c("a", "a", "a", "a", "f"), 
                      number = c(20L, 21L, 2L, 34L, 42L), 
                      fr = c(0.8, 2, 0.8, 0.8, 0.5), 
                      color = c("rot", "rot", "rot", "rot", "grun")), 
                 row.names = c(NA, -5L), class = "data.frame")

rep.rev <- function(x,t){
  if(t != 0){
    rep(x,t)
  } else {
    NA_integer_
  }
}

library(dplyr)
library(tidyr)

set.seed(22)

df1 %>% 
  mutate(divisor = floor(runif(n(), min = 2, max = 19)),
         quotient = number%/%divisor,
         remainder = ifelse(number%%divisor==0, NA, number%%divisor)) %>% 
  rowwise %>% 
  mutate(number = list(c(rep.rev(divisor, quotient),remainder))) %>% 
  unnest %>% 
  select(alpha, number, fr, color) %>% 
  filter(!is.na(number))

#> # A tibble: 14 x 4
#>    alpha number    fr color
#>    <chr>  <dbl> <dbl> <chr>
#>  1 a          7   0.8 rot  
#>  2 a          7   0.8 rot  
#>  3 a          6   0.8 rot  
#>  4 a         10   2   rot  
#>  5 a         10   2   rot  
#>  6 a          1   2   rot  
#>  7 a          2   0.8 rot  
#>  8 a         10   0.8 rot  
#>  9 a         10   0.8 rot  
#> 10 a         10   0.8 rot  
#> 11 a          4   0.8 rot  
#> 12 f         16   0.5 grun 
#> 13 f         16   0.5 grun 
#> 14 f         10   0.5 grun

票数 0

Stack Overflow用户

发布于 2019-09-24 05:47:52

我们有很多方法可以将一个数字一分为二，但下面的方法将每个数字一分为二，即偶数(m=2n)分为n和n，奇数(m=2n+1)分为(n+1)和n。

> library(dplyr)
> df <- data.frame(alpha=c("a","a","a","a","f"),
+                  number=c(20,21,2,34,42),
+                  fr=c(0.8,2.0,0.8,0.8,0.5),
+                  color=c("rot","rot","rot","rot","grün"))

函数doSplit()以数据帧df和整数threshold作为参数。

> doSplit <- function(df, threshold){
+   # splits rows where number >= threshold until all rows have number < threshold
+   
+   colNames <- colnames(df)
+   df <- df %>% mutate(orig_id=rownames(df))
+   dfBelow <- df %>% filter(number<threshold)
+   dfAbove1 <- df %>% filter(number>=threshold) %>% mutate(number=(number%/%2)+(number%%2))
+   dfAbove2 <- df %>% filter(number>=threshold) %>% mutate(number=number%/%2)
+   combData <- rbind(dfBelow, dfAbove1, dfAbove2)
+   combData <- combData %>% arrange(orig_id) %>% select(colNames)
+   return(combData)  
+ }

这里，我们将阈值定义为20。只要存在编号为>=20的行，while循环就会重复调用doSplit()函数。

> myThreshold <- 20
> splitDf <- df
> while(splitDf %>% pull(number) %>% max() >= myThreshold){
+     splitDf <- doSplit(splitDf, myThreshold)
+ }

下面是拆分后的数据帧：

> splitDf
   alpha number  fr color
1      a     10 0.8   rot
2      a     10 0.8   rot
3      a     11 2.0   rot
4      a     10 2.0   rot
5      a      2 0.8   rot
6      a     17 0.8   rot
7      a     17 0.8   rot
8      f     11 0.5  grün
9      f     10 0.5  grün
10     f     11 0.5  grün
11     f     10 0.5  grün

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/58067170

复制

相似问题

问在R中拆分数据帧
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在R中拆分数据帧EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在R中拆分数据帧
EN