我有一些数据:
数据
structure(list(WBC_BASELINE = c(2.9, NA, NA, 6.9, NA, NA, NA,
NA, NA, NA, 7.4, 12.8, NA, NA, NA, NA, NA, 4.2, NA, NA), WBC_FIRST = c(2.4,
14.8, 11, 7.3, 4.5, NA, NA, 6.1, 7.7, 16.2, 5.3, 10.3, 14.5,
NA, NA, 12.8, 3.7, 4.7, 16.6, 9.3), neuts_BASELINE = c(2, NA,
NA, 5.4, NA, NA, NA, NA, NA, NA, 4.96, 8.9, NA, NA, NA, NA, NA,
NA, NA, NA), neuts_FIRST = c(1.5, 13, 5.8, 4.5, 1.6, NA, NA,
1.7, 4.3, 9.3, 3.4, 5.8, 10.1, NA, NA, 9.7, 2.3, 3.5, 5, 8.2)), row.names = c(NA,
20L), class = "data.frame")在数据集中,我有一些血液测试结果(在本例中,白细胞和中性粒细胞在两个时间点采集-基线和第一个时间点)。我希望选择基准值(如果存在),否则选择第一个值。
我可以分别为白细胞和中性粒细胞做这件事,但我想为20种不同的血液测试做这件事,而不是每次都硬编码……
硬编码方式:
data %>% mutate(WBC_first_value=ifelse(!is.na(WBC_BASELINE), WBC_BASELINE, WBC_FIRST)) %>%
mutate(neuts_first_value=ifelse(!is.na(neuts_BASELINE), neuts_BASELINE, neuts_FIRST)) 请注意,每次验血后总是紧跟着_BASELINE和_FIRST
如果有任何帮助,我将不胜感激!
发布于 2021-10-20 23:49:37
我们可以结合使用pivot_longer和pivot_wider,通过一些数据处理来自动化这个过程:
library(dplyr)
library(tidyr)
data %>%
mutate(rn = row_number()) %>%
pivot_longer(cols = -rn, names_to = c('grp', '.value'),
names_sep = "_") %>%
group_by(grp) %>%
transmute(rn, new = coalesce(BASELINE, FIRST)) %>%
pivot_wider(names_from = grp, values_from = new) %>%
select(-rn) %>%
bind_cols(data, .)输出:
WBC_BASELINE WBC_FIRST neuts_BASELINE neuts_FIRST WBC neuts
1 2.9 2.4 2.00 1.5 2.9 2.00
2 NA 14.8 NA 13.0 14.8 13.00
3 NA 11.0 NA 5.8 11.0 5.80
4 6.9 7.3 5.40 4.5 6.9 5.40
5 NA 4.5 NA 1.6 4.5 1.60
6 NA NA NA NA NA NA
7 NA NA NA NA NA NA
8 NA 6.1 NA 1.7 6.1 1.70
9 NA 7.7 NA 4.3 7.7 4.30
10 NA 16.2 NA 9.3 16.2 9.30
11 7.4 5.3 4.96 3.4 7.4 4.96
12 12.8 10.3 8.90 5.8 12.8 8.90
13 NA 14.5 NA 10.1 14.5 10.10
14 NA NA NA NA NA NA
15 NA NA NA NA NA NA
16 NA 12.8 NA 9.7 12.8 9.70
17 NA 3.7 NA 2.3 3.7 2.30
18 4.2 4.7 NA 3.5 4.2 3.50
19 NA 16.6 NA 5.0 16.6 5.00
20 NA 9.3 NA 8.2 9.3 8.20发布于 2021-10-20 23:26:44
你可以用一个循环来做这件事!
vars <- c("WBC", "neuts")
for(v in vars){
df[,paste0(v, "_new")] <- ifelse(!is.na(df[,paste0(v, "_BASELINE")]), df[,paste0(v, "_BASELINE")], df[,paste0(v, "_FIRST")])
}或使用sapply
sapply(vars, function(v) ifelse(!is.na(df[,paste0(v, "_BASELINE")]),df[,paste0(v, "_BASELINE")], df[,paste0(v, "_FIRST")]))也可以通过编程方式定义var:
vars <- unique(gsub(pattern = "^([A-Za-z]+)_[A-Za-z]+", "\\1", names(df)))https://stackoverflow.com/questions/69653980
复制相似问题