我有一组数据帧,在每一帧中,一个列中都会出现多个相同的字符串,但它们确实反映了不同的观察结果。
library(dplyr)
govs <- c("Government", "Federal", "General government", "Government enterprises", "State and local",
"General government", "Government enterprises")
df <- data.frame("gov_levels" = govs, revenue = rnorm(7, mean = 1000, sd = 50))
df我想用不同的模式替换(或串联)每个事件,这样它们就变得不同了。此代码将返回所需的输出,
df %>%
mutate(gov_levels = stri_replace_first_fixed(str = gov_levels, pattern = "General government",
replacement = c("Federal general government",
"State and local general government")))但是,取决于"General government“是在偶数行还是奇数行,这是不一致的,正如我在变异前删除第一行时所说明的那样:
df %>%
filter(gov_levels != "Government") %>%
mutate(gov_levels = stri_replace_first_fixed(str = gov_levels, pattern = "General government",
replacement = c("Federal general government",
"State and local general government"))) 这会导致替换以错误的顺序发生。我正在寻找一种方法来始终如一地应用这个方法,这样它就不会依赖于要替换的字符串的行位置。也就是说,第一场比赛将永远由联邦政府取代,第二场比赛将永远由州和地方政府取代。
根据GEORGE的数据帧的答复列表进行更新,但有一些不一致之处:
govs <- c("Government", "Federal", "General government", "Government enterprises", "State and local",
"General government", "Government enterprises", NA, NA)
df1 <- data.frame("col_1" = "col1data", "gov_levels" = govs, revenue = c(rnorm(7, mean = 100, sd = 50), NA, NA), stringsAsFactors = FALSE)
df2 <- data.frame("col_1" = "col1data", "gov_types" = govs, revenue = c(rnorm(7, mean = 100, sd = 50), NA, NA), stringsAsFactors = FALSE)
df2 <- df2 %>%
filter(gov_types != "Government")
df_list <- list(df1, df2)用lapply实现George的解决方案来处理我提到的其他问题--我很好奇是否有更好的方法来解决这个问题?
newlevels_gen <- c("Federal general government", "State and local general government")
df_list <- lapply(df_list,
function(x) {x[, 2] <- as.factor(x[, 2])
return(x)
}
)
df_list <- lapply(df_list, function(x) {levels(x[,2]) <- c(levels(x[,2]), newlevels_gen)
return(x)
}
)
df_list_clean_a <- lapply(df_list, function(x) {x[,2][!is.na(x[,2]) & x[,2] == "General government"] <- newlevels_gen
return(x)
}
)发布于 2020-03-05 15:14:50
这能达到你的目标吗?只需将满足条件的元素替换为所需的字符串向量即可。但是首先,您需要为您的因子添加允许的级别,否则您将得到一个错误。
# First define a string containing the new levels for the 'gov_levels' factor
newlevels <- c("Federal general government", "State and local general government")
# Then add them so that they are allowed as factor levels
levels(df$gov_levels) <- c(levels(df$gov_levels), newlevels)
# Now just replace the values where 'gov_levels' is "General government" with the new string
# They will naturally be assigned in the same order they occur in the dataset
df$gov_levels[df$gov_levels=="General government"] <- newlevels当然,只有在有两次发生的情况下,这才能奏效,但我想从你的问题上看,情况是这样的吗?如果有两个或更少的,那么我们可以调整最后一行来计算所需出现的次数。
编辑
如果向量gov_levels是一个字符向量,而不是一个因素,那么您不需要担心级别,您可以这样做:
df$gov_levels[df$gov_levels=="General government"] <-
c("Federal general government", "State and local general")https://stackoverflow.com/questions/60547757
复制相似问题