我正在编写一个处理GL字符串数据的自定义函数(如果对数据格式感兴趣,请参阅此链接:GL串)
数据如下:
(test <- tribble(
~case, ~A, ~B, ~DRB1, ~DRB3,
1, "HLA-A*30:02:01+HLA-A*32:01:01", "HLA-B*15:17:01", "HLA-DRB1*13:02:01+HLA-DRB1*13:03:01", "HLA-DRB3*02:02:01+HLA-DRB3*03:01:01",
2, "HLA-A*23:01:01+HLA-A*33:03:01", "HLA-B*35:03:01+HLA-B*55:01:01", "HLA-DRB1*08:01:01+HLA-DRB1*11:01:01|HLA-DRB1*08:77+HLA-DRB1*11:277", "HLA-DRB3*02:02:01",
3, "HLA-A*02:01:01", "HLA-B*50:01:01+HLA-B*51:01:01", "HLA-DRB1*03:01:01+HLA-DRB1*04:05:01", NA,
4, "HLA-A*02:01:01+HLA-A*32:01:01", NA, "HLA-DRB1*11:04:01+HLA-DRB1*15:02:01", "HLA-DRB3*01:62:01+HLA-DRB3*02:02:01|HLA-DRB3*01:91+HLA-DRB3*02:133"
))我编写的函数是将字符串分成两列。|符号是分隔符。该函数还可以选择在第一个|之后保留或丢弃任何额外的数据。我编写的函数是:
GLstring_genotype_ambiguity <- function(.data, columns, keep_ambiguities = TRUE) {
# Copy GL string to a new ambiguity column
.data %>% mutate(across({{ columns }}, ~ as.character(.), .names = "{col}_ambiguity")) %>%
# Extract the first genotype and put in the original column
mutate(across({{ columns }}, ~ str_extract(., "[^|]+"))) %>%
# Remove the first genotype from the ambiguity column
mutate(across(ends_with("ambiguity"), ~ str_replace(., "[^|]+", ""))) %>%
mutate(across(ends_with("ambiguity"), ~ str_replace(., "[\\|]+", ""))) %>%
mutate(across(ends_with("ambiguity"), ~ na_if(., ""))) %>%
# Either keep or remove the ambiguity column
{ if (keep_ambiguities) . else select(., -contains("ambiguity")) }
}当我使用本机由columns识别的across参数时,此函数按照预期工作。
test %>% select(A) %>% GLstring_genotype_ambiguity(A)
test %>% select(DRB3) %>% GLstring_genotype_ambiguity(DRB3, keep_ambiguities = FALSE)
test %>% GLstring_genotype_ambiguity(A:DRB3)
test %>% GLstring_genotype_ambiguity(c(A, B, DRB1))但是,当我使用选择助手时,它不起作用:
test %>% GLstring_genotype_ambiguity(starts_with("D"))
test %>% GLstring_genotype_ambiguity(everything())在这些情况下,正确地提取了第一个歧义,但其余的歧义不会出现在以_ambiguity结尾的列中。很明显,我对甄选助理的工作方式有所误解。
发布于 2022-10-11 13:10:17
保罗·斯塔福德·艾伦的建议是识别函数中的列名并具体使用它们。就像这样
GLstring_genotype_ambiguity <- function(.data, columns, keep_ambiguities = TRUE) {
# Copy GL string to a new ambiguity column
cols2do <- names(select(.data,{{columns}}))
.data %>%
mutate(across({{ cols2do }},
~ as.character(.),
.names = "{col}_ambiguity")) %>%
mutate(across({{ cols2do }}, ~ str_extract(., "[^|]+"))) %>%
mutate(across(ends_with("ambiguity"), ~ str_replace(., "[^|]+", ""))) %>%
mutate(across(ends_with("ambiguity"), ~ str_replace(., "[\\|]+", ""))) %>%
mutate(across(ends_with("ambiguity"), ~ na_if(., ""))) %>%
{ if (keep_ambiguities) . else select(., -contains("ambiguity")) }
}https://stackoverflow.com/questions/73987585
复制相似问题