我有两个非常相似的数据集(简化)一个:
library(tidyverse)
dataset <- tribble(
~patient, ~tumor, ~trt_date, ~fup_date, ~system,
001, "t1", "2022-01-01", "2022-05-05", NA,
001, "t1", "2022-01-01", "2022-05-05", NA,
001, "t1", "2022-01-01", "2022-05-05", NA,
002, "t1", "2022-02-02", "2022-07-07", 2,
002, "t1", "2022-02-02", "2022-07-07", 2,
002, "t2", "2022-02-02", "2022-07-07", 2,
002, "t2", "2022-02-02", "2022-07-07", 2,
002, "t2", "2022-02-02", "2022-07-07", 2,
003, "t1", "2022-01-01", "2022-05-05", 1,
003, "t2", "2022-06-06", "2022-07-07", 1,
003, "t3", "2022-06-06", "2022-08-08", 1,
004, "t1", "2022-05-05", "2022-07-07", NA,
004, "t1", "2022-05-05", "2022-07-07", NA,
004, "t2", "2022-11-11", "2022-12-12", NA,
004, "t2", "2022-11-11", "2022-12-12", NA,
005, "t1", "2022-02-02", "2022-09-09", 2,
005, "t1", "2022-02-02", "2022-09-09", 2,
005, "t1", "2022-02-02", "2022-09-09", 2,
005, "t2", "2022-05-05", "2022-07-07", NA,
005, "t3", "2022-10-10", "2022-11-11", NA,
005, "t3", "2022-10-10", "2022-11-11", NA,
006, "t1", NA, "2022-11-11", 2,
006, "t2", NA, "2022-11-11", 2
)还有一个过滤版本:
dataset_system <- dataset %>%
filter(!is.na(system))我想以大致相同的方式对它们进行修改,除了几个步骤,比如在使用distinct()之前用不同的方式对它们进行分组,然后再继续对这两个数据集适用的步骤。我想我可以用map()来完成这个任务,但是结果将是一个包含这两个数据文件的列表,而不是在环境中保留单独的实体。因此,我尝试了walk()与get()和assign()结合使用,但是我无法让任何条件操作在这个块中执行应该以不同方式对待它们的步骤。
企图A:
.x = c("dataset", "dataset_system"),
.f = function(df_name) {
df <- get(df_name, envir = .GlobalEnv)
df <- df %>%
filter(!is.na(trt_date)) %>%
if(df_name == "dataset") {
group_by(patient)
} else {
group_by(patient, tumor)} %>%
distinct() %>%
ungroup()
new_df <- paste(df_name, "system", sep = "_")
assign(new_df, df, envir = .GlobalEnv)
}
)结果:错误的if (.)df_name == "dataset“dataset{:条件长度>1
企图B:
.x = c("dataset", "dataset_system"),
.f = function(df_name) {
df <- get(df_name, envir = .GlobalEnv)
df <- df %>%
filter(!is.na(trt_date)) %>%
when(dataset == "dataset"
~ group_by(patient),
group_by(patient, tumor)) %>%
distinct() %>%
ungroup()
new_df <- paste(df_name, "system", sep = "_")
assign(new_df, df, envir = .GlobalEnv)
}
)给出:错误的group_by(病人,肿瘤):对象‘病人’找不到
是因为我的语法搞砸了,还是这不是做这种事情的正确方式?谢谢!
发布于 2022-09-13 10:40:56
听起来您应该使用purrr::map (或变体)而不是purrr::walk,所以您应该将列表中的数据集传递给purrr::map。这意味着您根本不必使用get或assign。
此外,如果提供命名列表,则可以使用purrr::imap获取数据集的名称。下面这样的东西应该能起作用:
my_dataset_list <- list("dataset" = dataset,
"dataset_system" = dataset_system)
new_datasets <- purrr::imap(
.x = my_dataset_list,
.f = function(df, df_name) {
new_df <- df %>%
filter(!is.na(trt_date))
if (df_name == "dataset") {
new_df <- new_df %>%
group_by(patient)
} else {
new_df <- new_df %>%
group_by(patient, tumor)
}
new_df %>%
distinct() %>%
ungroup()
}
)
new_datasets[["dataset"]]
new_datasets[["dataset_system"]]发布于 2022-09-13 13:45:34
我和imap一起工作,谢谢你的帮助!
现在,我想知道是否可以在时间()或if_else ()的帮助下用所有new_df <- new_df语句替换new_df块。
我在尝试这样的东西..。
.x = datasets,
.f = function(df, df_name) {
new_df <- df %>%
filter(!is.na(trt_date) %>%
when(df_name == "xy"
~ group_by(.$patient, .$tumor),
~ group_by(.$patient)) %>%
distinct() %>%
ungroup() %>%
when(df_name == "xy"
~ group_by(.$patient, .$tumor),
~ group_by(.$patient)) %>%
type_convert() %>%
summarise(var = n())
}
)..。但是在UseMethod("group_by")中有一个错误:没有适用于类"c('integer','group_by‘)的对象的’group_by‘的方法。
将比使用if_else更优雅。谢谢你的建议!
https://stackoverflow.com/questions/73700153
复制相似问题