我正在处理PUMS数据,并且我正在使用tidycensus来转换某些列的值。但是,这将添加一个结构为: columnname_label的新列。我想用columnname_label中的新翻译值替换原始列。下面是我的数据框示例:
df <-data.frame(Region= c(1,2,1,4,3,1),
Region_label = c("North", "South", "North", "West", "East", "North"),
Broadband = c(0,1,0,0,0,1)
Broadband_label = c("No","Yes","No","No","No","Yes")
Hispeed = c(1,1,0,0,1,0)
Hispeed_label = c("Yes", "Yes","No","No","Yes","No")我知道我可以用tidycensus写出这样的代码:
library(tidyverse)
recode <- df %>% mutate(Region = Region_label) %>% mutate(Broadband = Broadband_label) %>%
mutate(Hispeed = Hispeed_label)但是,我有66列需要与"_label“列匹配。有没有一种更优雅的方式来实现这一点,而不是编写66条变异语句?
我曾尝试使用mutate_at编写一个循环,但它不起作用。
subset1 <- grep('*label*',names(df),value = TRUE)
name <- names(df)
for (i in subset1) {
final <- final_house %>% mutate_at(vars(matches(trimws(name,"right","\\_label"))),i)
}这不起作用。
谢谢!
发布于 2021-04-14 02:45:46
library(dplyr)
df %>%
select(ends_with("label")) %>%
rename_with(~ gsub("_label","", .))输出
Region Broadband Hispeed
1 North No Yes
2 South Yes Yes
3 North No No
4 West No No
5 East No Yes
6 North Yes No发布于 2021-04-14 02:48:02
首先,选择以"label"结尾的所有列。然后,您可以使用reprex提取字符串,直到第一个下划线(_)。
df %>%
select(ends_with("label")) %>%
rename_with(~ str_extract(.x, "^[^_]+(?=_)"))请注意,为此您需要dplyr版本1.0.0或更高版本。
发布于 2021-04-14 02:50:22
使用across,这将为每个名称为X的列分配名称为X_label的列。如果可能存在没有对应_label列的列,则将across的第一个参数替换为sub("_label$", "", subset1),其中在问题中定义了subset1。
df %>%
mutate(across(!ends_with("_label"),
~ cur_data()[[paste0(cur_column(), "_label")]]))给予:
Region Region_label Broadband Broadband_label Hispeed Hispeed_label
1 North North No No Yes Yes
2 South South Yes Yes Yes Yes
3 North North No No No No
4 West West No No No No
5 East East No No Yes Yes
6 North North Yes Yes No No请注意,仅使用base R即可轻松完成上述操作:
replace(df, sub("_label$", "", subset1), df[subset1])或者添加管道:
df %>% replace(sub("_label$", "", subset1), .[subset1])https://stackoverflow.com/questions/67080366
复制相似问题