我的问题如下:
我有一个tibble,我想用3种不同的情况来修改:
组中的所有值都是NA
NA。在本例中,用任意值替换0.5)
NA是NA)
示例:(使用group_by ind)
a1 = c(0.3,0.1,NA,0.7,0.2)
a2 = rep(NA,5)
a3 = c(0.1,0.3,0.5,0.7,0.8)
tibble(ind = c(rep("A",5),rep("B",5),rep("C",5)),
value = c(a1,a2,a3)0.5,0.7,0.2)组A组段产c(0.3,0.1,A)
B组的片段应产生代表(NA,5)
组C段应保持不变
我尝试过使用if、ifelse和case_when语句,但我想我遗漏了一些非常明显的东西。所有的帮助都是感激的。
发布于 2020-07-27 20:58:20
编辑:
尽管我知道有一种更简洁的方法,但这里有一种黑客的方法:
library(dplyr)
df %>%
group_by(ind) %>%
mutate_if(is.logical, as.numeric) %>%
mutate(a1 = case_when(is.na(a1) & sum(is.na(a1)) < length(a1) ~ 0.5, TRUE ~ a1),
a2 = case_when(is.na(a2) & sum(is.na(a2)) < length(a2) ~ 0.5, TRUE ~ a2),
a3 = case_when(is.na(a3) & sum(is.na(a3)) < length(a3) ~ 0.5, TRUE ~ a3))Edit2:以下是更简洁的方法
point_five <- function(x){
x = case_when(is.na(x) & sum(is.na(x)) < length(x) ~ 0.5, TRUE ~ x)
}
df %>%
group_by(ind) %>%
mutate_if(is.logical, as.numeric) %>%
mutate(across(.cols = c(a1:a3), ~ point_five(.)))这使我们:
# A tibble: 5 x 4
# Groups: ind [1]
ind a1 a2 a3
<chr> <dbl> <dbl> <dbl>
1 A 0.3 NA 0.1
2 A 0.1 NA 0.3
3 A 0.5 NA 0.5
4 A 0.7 NA 0.7
5 A 0.2 NA 0.8如果我们有df2,包含ind的两个组,group_by将给我们:
ind a1 a2 a3
<chr> <dbl> <dbl> <dbl>
1 A 0.3 NA 0.1
2 A 0.5 NA 0.3
3 A 0.5 NA 0.5
4 A 0.7 NA 0.7
5 A 0.2 NA 0.8
6 B 0.5 NA 0.1
7 B 0.5 NA 0.3
8 B 0.5 NA 0.5
9 B 0.5 NA 0.7
10 B 0.2 NA 0.8df2
structure(list(ind = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B"), a1 = c(0.3, 0.5, NA, 0.7, 0.2, NA, 0.5, NA, NA, 0.2
), a2 = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), a3 = c(0.1,
0.3, 0.5, 0.7, 0.8, 0.1, 0.3, 0.5, 0.7, 0.8)), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))发布于 2020-07-27 21:48:02
一种使用case_when的方法
df %>%
group_by(ind) %>%
mutate(value2 = case_when(
!all(is.na(value)) & is.na(value) ~ 0.5,
TRUE ~ value
))
#-----
# A tibble: 15 x 3
# Groups: ind [3]
ind value value2
<chr> <dbl> <dbl>
1 A 0.3 0.3
2 A 0.1 0.1
3 A NA 0.5
4 A 0.7 0.7
5 A 0.2 0.2
6 B NA NA
7 B NA NA
8 B NA NA
9 B NA NA
10 B NA NA
11 C 0.1 0.1
12 C 0.3 0.3
13 C 0.5 0.5
14 C 0.7 0.7
15 C 0.8 0.8示例数据
a1 = c(0.3,0.1,NA,0.7,0.2)
a2 = rep(NA,5)
a3 = c(0.1,0.3,0.5,0.7,0.8)
df <- tibble(ind = c(rep("A",5),rep("B",5),rep("C",5)),
value = c(a1,a2,a3))https://stackoverflow.com/questions/63123376
复制相似问题