我想知道是否有任何方法可以包装下面的代码以使其更简短;我正在考虑使用循环或类似的函数来完成此操作。这段代码使用AgeatDeath和Disability生成一个新变量(cat)。如果AgeatDeath介于75.6和77.1之间,并且Disability等于"No Intelectual and Developmental“,则代码将创建值为75.6-77.1的cat变量。谢谢,纳德
IDD <- IDD %>%
mutate(
cat = case_when(
AgeatDeath >= 75.6 &
AgeatDeath < 77.1 &
Disability == 'No Intelectual and Developmental Disabilities' ~ '75.6-77.1',
AgeatDeath >= 74.3 &
AgeatDeath < 75.6 &
Disability == 'No Intelectual and Developmental Disabilities' ~ '74.3-75.6',
AgeatDeath >= 72.5 &
AgeatDeath < 74.3 &
Disability == 'No Intelectual and Developmental Disabilities' ~ '72.5-74.3',
AgeatDeath >= 66.5 &
AgeatDeath < 72.5 &
Disability == 'No Intelectual and Developmental Disabilities' ~ '66.6-72.5',
AgeatDeath >= 64.1 &
AgeatDeath < 71.9 &
Disability == 'Intellectual disability' ~ '64.1-71.9',
AgeatDeath >= 62.3 &
AgeatDeath < 64.1 &
Disability == 'Intellectual disability' ~ '62.3-64.1',
AgeatDeath >= 59.4 &
AgeatDeath < 62.3 &
Disability == 'Intellectual disability' ~ '59.4-62.3',
AgeatDeath >= 50.4 &
AgeatDeath < 59.4 &
Disability == 'Intellectual disability' ~ '50.4-59.4',
AgeatDeath >= 56.47 &
AgeatDeath < 59.1 &
Disability == 'Down syndrome' ~ '56.47-59',
AgeatDeath >= 55.59 &
AgeatDeath < 56.47 &
Disability == 'Down syndrome' ~ '55.59-56.47',
AgeatDeath >= 54.42 &
AgeatDeath < 55.59 &
Disability == 'Down syndrome' ~ '54.42-55.59',
AgeatDeath >= 50.92 &
AgeatDeath < 54.42 &
Disability == 'Down syndrome' ~ '50.92-54.42',
AgeatDeath >= 53.3 &
AgeatDeath < 58.2 &
Disability == 'Cerebral palsy' ~ '53.3-58.2',
AgeatDeath >= 50.6 &
AgeatDeath < 53.3 &
Disability == 'Cerebral palsy' ~ '50.6-53.3',
AgeatDeath >= 48.9 &
AgeatDeath < 50.6 &
Disability == 'Cerebral palsy' ~ '48.9-50.6',
AgeatDeath >= 41.38 &
AgeatDeath < 48.9 &
Disability == 'Cerebral palsy' ~ '41.4-48.9',
AgeatDeath >= 44.2 &
AgeatDeath < 51.1 &
Disability == 'Other rare developmental disabilities' ~ '44.2-51',
AgeatDeath >= 41.6 &
AgeatDeath < 44.2 &
Disability == 'Other rare developmental disabilities' ~ '41.6-44.2',
AgeatDeath >= 30.6 &
AgeatDeath < 38.4 &
Disability == 'Other rare developmental disabilities' ~ '30.6-38.4',
AgeatDeath >= 38.4 &
AgeatDeath < 41.6 &
Disability == 'Other rare developmental disabilities' ~ '38.4-41.6'
)
)发布于 2020-11-20 09:50:40
一些子集和函数cut()可以发挥很大的作用。我将演示的内容不涉及dplyr。
首先创建一个emtpy新变量。我们将使用代码的其余部分在几行代码中进行填充。
IDD$cat <- NA_character接下来,使用Disability的值和相应的切割点创建一个列表。我们将遍历这个列表。
L <- list(
`No Intelectual and Developmental Disabilities` = c(66.6, 72.5, 74.3, 75.6, 77.1),
`Intellectual disability` = c(50.4, 59.4, 62.3, 64.1, 71.9)
)你可以填完剩下的部分。现在,我们将使用一个循环来根据Disability的每个值进行子集,使用cut()将这些值划分为类别并重命名这些类别。
for (d in names(L)) {
IDD$cat[IDD$Disability == d] <- as.character(
cut(IDD$Ageatdeath,
breaks = L[[d]],
labels = paste(L[[d]][-4], L[[d]][-1], sep = "-"),
include.lowest = TRUE,
right = FALSE))
}cut()根据我们提供给L的断点拆分Ageatdeath。我们根据断点给它加标签。right = FALSE使得每个类别都包括下界并排除上界,并且include.lowest = TRUE确保如果任何值在上界,它们都会被包括在最高类别中。我们使用as.character()来确保它是一个字符向量,而不是一个因子。
发布于 2020-11-20 10:01:43
无论您采用哪种方法,您仍然需要将阈值和条件存储在某个地方。现在,这些代码已经写入到您的代码中,但是可以将它们移动到表中。
考虑设置一张表
order | min_age | max_age | disability
------+--------+---------+------------
1 |75.6 | 77.1 | 'No Intelectual and Developmental Disabilities'
2 |74.3 | 75.6 | 'No Intelectual and Developmental Disabilities'
etc.
...然后,您可以使用该表来设置条件。遵循this问题中的parse_exprs方法:
# loading of condition table
# other setup
# etc.
# ensure conditions are in the preferred order
twc = table_w_conditions %>%
arrange(order)
# make text strings of conditions
conditions = paste("AgeatDeath >=", twc$min_age,
"& AgeatDeath <", twc$max_age,
"& Disability ==", twc&disability,
" ~ '", twc$min_age, "-", twc$max_age, "'")
# mutate treating text strings as code
IDD <- IDD %>%
mutate(
cat = case_when(!!!parse_exprs(conditions))
)如果您采用这种方法,我建议您在使用conditions之前检查它是否包含正确条件文本的文本字符串列表。
https://stackoverflow.com/questions/64922211
复制相似问题