这是我第一次发问。我对R很陌生,我寻找答案已经有一段时间了,却没有找到答案。就这么办了。我有一个非常大的数据集(超过140 K的数据集),其中一列包含"programtype“的类别,选项如下:
我想要做的是创建一个新的列,在其中我合并一些类别在一起。我想:
他们中的一些会保持原样。我尝试过ifelse语句,但似乎很难确定原始专栏中的内容,并返回NAs进行大量的观察。我检查了我所有的拼写,所以不是这样的。这是我在这里的另一个答案的基础上尝试的。我的数据集的名称是TP_state,其他列的名称是lagoslakeid。但是,它不能正常工作。任何帮助都将不胜感激!
x <- c(TP_state$programtype)
y <- c(TP_state$lagoslakeid)
df <- data.frame(x,y)
DT <- data.table(df)
DT[, Program_Type := ifelse(x %in% c("Federal Agency", "Federal Agency/University", "National Survey Program"), "Federal Agency/University",
ifelse(x %in% c("LTER", "University"), "LTER/University",
ifelse(x %in% c("Non-Profit Agency"), "Non-Profit Agency",
ifelse(x %in% c("State Agency"), "State Agency",
ifelse(x %in% c("State Agency/University/Citizen Monitoring Program", "State Agency/Citizen Monitoring Program"), "Citizen Monitoring Program",
ifelse(x %in% c("Tribal Agency"), "Tribal Agency", NA))))))] 发布于 2017-07-13 19:30:50
我会尝试这样的方法。请告诉我它是否对你有用!
for(i in 1:length(df$column_with_factors)){
if(grepl(pattern = 'federal agency|national survey program', x = df$column[i], ignore.case = TRUE)){
x <- 'Federal Agency/University'
} else if(grepl(pattern = '^lter$|^university$', x = df$column[i], ignore.case = TRUE)){
x <- 'LTER/University'
} else if(grepl(pattern = 'non-profit agency', x = df$column[i], ignore.case = TRUE)){
x <- 'Non-profit Agency'
} else if(grepl(pattern = '^state agency$', x = df$column[i], ignore.case = TRUE)){
x <- 'State Agency'
} else if(grepl(pattern = 'state agency/(citizen monitoring program|university/citizen monitoring program)', x = df$column[i], ignore.case = TRUE)){
x <- 'Citizen Science Monitoring Program'
} else if(grepl(pattern = 'tribal agency', x = df$column[i], ignore.case = TRUE)){
x <- 'Tribal Agency'
} else x <- NA
}
df$column_with_factors <- as.factor(df$column_with_factors)但这会跑得更快:
df$column_with_factors <- sapply(df$column_with_factors, function(x){
if(grepl(pattern = 'federal agency|national survey program', x = x, ignore.case = TRUE)){
x <- 'Federal Agency/University'
} else if(grepl(pattern = '^lter$|^university$', x = x, ignore.case = TRUE)){
x <- 'LTER/University'
} else if(grepl(pattern = 'non-profit agency', x = x, ignore.case = TRUE)){
x <- 'Non-profit Agency'
} else if(grepl(pattern = '^state agency$', x = x, ignore.case = TRUE)){
x <- 'State Agency'
} else if(grepl(pattern = 'state agency/(citizen monitoring program|university/citizen monitoring program)', x = x, ignore.case = TRUE)){
x <- 'Citizen Science Monitoring Program'
} else if(grepl(pattern = 'tribal agency', x = x, ignore.case = TRUE)){
x <- 'Tribal Agency'
} else x <- NA
})
df$column_with_factors <- as.factor(df$column_with_factors)发布于 2017-07-13 20:25:41
forcats包在对这样的任务进行重新编码方面非常出色。
首先创建一些演示数据..。
library(tidyverse)
library(forcats)
df <-
tibble(
programtype = c(
"Federal Agency",
"Federal Agency",
"Federal Agency",
"State Agency/University/Citizen Monitoring",
"State Agency/University/Citizen Monitoring Program",
"Federal Agency/University",
"National Survey Program",
"LTER",
"University",
"Non-Profit Agency",
"Non-Profit Agency",
"Non-Profit Agency",
"Non-Profit Agency",
"Non-Profit Agency",
"State Agency",
"State Agency",
"State Agency/Citizen Monitoring Program",
"State Agency/University/Citizen Monitoring Program",
"Tribal Agency",
"Tribal Agency",
"Tribal Agency"
),
ID = 1:21
)然后使用fct_recode替换值..。
df %>%
mutate(
new_categories = fct_recode(
programtype,
"Federal Agency/University" = "Federal Agency",
"Federal Agency/University" = "Federal Agency/University",
"Federal Agency/University" = "National Survey Program",
"LTER/University" = "LTER",
"LTER/University" = "University",
"Citizen Science Monitoring Program" = "State Agency/Citizen Monitoring Program",
"Citizen Science Monitoring Program" = "State Agency/University/Citizen Monitoring"
)
)https://stackoverflow.com/questions/45088642
复制相似问题