我有一个df:
A B C
NP All M4 6
NP All M4 8
NP All FBS 3
NI C1_D2 8
NI C1D9: PT PI-4, A,B AM1 6
NI C1D9: PT P3,4 B,E A6 9
NN W1D5: PRE 2
NN W1D5: PRE 6
NI W1D5: PRE 5A <- c("NP", "NP", "NP", "NI", "NI", "N1", "NN", "NN", "N1")
B <- c("All M4", "All M4", "All FBS", "C1_D2", "C1D9: PT PI-4, A,B AM1", "C1D9: PT P3,4 B,E A6 ", "W1D5: PRE", "W1D5: PRE", "W1D5: PRE")
C <- c("6","8","3","8","6","9","2","6","5")
df <- data.frame(A, B, C)
df我想重命名B列中的变量,然后按A和D列分组,得到C列的和。我目前的代码是:
df2 <- df %>%
mutate(D = case_when(
startsWith(B, "All") ~ "ALL",
startsWith(B, "C1_D") ~ "CASE 1 DEAL 2",
startsWith(B, "C1D9") ~ "CASE 1 DEAL 9",
startsWith(B, "W1D5") ~ "WELL 1 DEAL 5",
)) %>%
group_by(A, D) %>% summaries(C =n())我得到了错误代码: mutate()输入问题mutate 3 (startWith(B,"All“~ "ALL")必须是一个双边公式,而不是字符向量。任何其他更有效地编写代码的方法都会受到赞赏,因为我不喜欢使用基R。
df2应该是这样的
A D C
NP ALL 17
NI CASE 1 DEAL 2 8
NI CASE 1 DEAL 9 15
NN WELL 1 DEAL 5 8
NI WELL 1 DEAL 5 5发布于 2021-08-19 15:25:41
这就是你需要的吗?
library(dplyr)
df %>%
mutate(D = case_when(grepl("^All", B) ~ "ALL",
grepl("^C1_D", B) ~ "CASE 1 DEAL 2",
grepl("^C1D9", B) ~ "CASE 1 DEAL 9",
grepl("^W1D5", B) ~ "WELL 1 DEAL 5")) %>%
group_by(A,D) %>%
summarise(C = sum(as.numeric(C)))
# A tibble: 6 x 3
# Groups: A [4]
A D C
<chr> <chr> <dbl>
1 N1 CASE 1 DEAL 9 9
2 N1 WELL 1 DEAL 5 5
3 NI CASE 1 DEAL 2 8
4 NI CASE 1 DEAL 9 6
5 NN WELL 1 DEAL 5 8
6 NP ALL 17发布于 2021-08-19 15:37:36
str_detect从stringr包中检测字符串summarise sum of Cdf %>%
type.convert(as.is=TRUE) %>%
mutate(D = case_when(
str_detect(B, "All") ~ "ALL",
str_detect(B, "C1_D") ~ "CASE 1 DEAL 2",
str_detect(B, "C1D9") ~ "CASE 1 DEAL 9",
str_detect(B, "W1D5") ~ "WELL 1 DEAL 5",
TRUE ~ NA_character_)) %>%
group_by(D, A) %>%
summarise(C = sum(C)) %>%
select(A, D, C) A D C
<chr> <chr> <int>
1 NP ALL 17
2 NI CASE 1 DEAL 2 8
3 N1 CASE 1 DEAL 9 9
4 NI CASE 1 DEAL 9 6
5 N1 WELL 1 DEAL 5 5
6 NN WELL 1 DEAL 5 8发布于 2021-08-19 17:07:39
我们可以创建一个键/值数据集并执行一个fuzzyjoin
library(dplyr)
library(fuzzyjoin)
keydat <- tibble(B2 = c("All", "C1_D", "C1D9", "W1D5"),
D = c("ALL", "CASE 1 DEAL 2", "CASE 1 DEAL 9", "WELL 1 DEAL 5"))
regex_left_join(df, keydat, by = c("B" = "B2")) %>%
select(-B2) %>%
group_by(D, A) %>%
summarise(C = sum(as.numeric(C)), .groups = 'drop')
# A tibble: 6 x 3
D A C
<chr> <chr> <dbl>
1 ALL NP 17
2 CASE 1 DEAL 2 NI 8
3 CASE 1 DEAL 9 N1 9
4 CASE 1 DEAL 9 NI 6
5 WELL 1 DEAL 5 N1 5
6 WELL 1 DEAL 5 NN 8https://stackoverflow.com/questions/68850450
复制相似问题