我有一个很大的数据库,我想有条件地处理到另一列的重复项(不删除任何项)。下面是我的输入示例:
name nb_participants
INSTITUT BILA BILA 10
INSTITUT BILA BILA 4
INSTITUT BILA BILA NA
INSTITUT NZUNDU 3
INSTITUT NZUNDU 15
...structure(list(name = c("INSTITUT BILA BILA", "INSTITUT BILA BILA", "INSTITUT BILA BILA","INSTITUT NZUNDU","INSTITUT NZUNDU"), nb_participants = c(10, 4, NA,3,15)), row.names = c("1", "2","3","4","5"), class = "data.frame")所需的输出如下:
name nb_participants
INSTITUT BILA BILA 2-1 10
INSTITUT BILA BILA 2-2 4
INSTITUT BILA BILA 2-3 NA
INSTITUT NZUNDU 2-2 3
INSTITUT NZUNDU 2-1 15
...structure(list(name = c("INSTITUT BILA BILA 2-1", "INSTITUT BILA BILA 2-2", "INSTITUT BILA BILA 2-3","INSTITUT NZUNDU 2-2","INSTITUT NZUNDU 2-1"), nb_participants = c(10, 4, NA,3,15)), row.names = c("1", "2","3","4","5"), class = "data.frame")我有这个命令
data$name<-ave(as.character(data$name), data$name, FUN=function(x) if (length(x)>1) paste0(x[1], '-', seq_along(x), '') else x[1])但我不知道如何处理对另一列的有条件排名。现在我能做的唯一一件事是:
data<-data %>%
group_by(name) %>%
filter(nb_participants ==max(nb_participants))你有什么办法能帮上忙吗?非常感谢。
发布于 2021-10-21 08:32:28
这可能会帮助您解决问题(语法请使用tidyverse包)。首先,如果您希望按nb_participant变量降序对数据进行排序,请执行以下操作:
df <- df %>% arrange(name, desc(nb_participants))然后,您必须按名称创建迭代次数:
df <- df %>% group_by(name) %>% mutate(id = row_number())id列将计算每次出现name变量的次数。
如果您想要将这个数字添加到名称变量中,如示例所示,您可以简单地执行以下操作:
df$name <- paste(df$name, " 2- ", df$id, sep="")https://stackoverflow.com/questions/69658177
复制相似问题