我有一个大约1000列的数据框架。我对14个满意度评价变量感兴趣。
我需要删除任何14个评级变量中的任何一个包含“项目跳过”或NA的行。
是否有一种方法可以删除所有的行,其中NA或“项目跳过”出现在我感兴趣的满意度等级变量向量中,目前在向量'cols‘中。在下面的例子中,“cols”包含服务、效率和风味,但不包括经验和质量。
cols = c("Service","Efficiency","Flavour")
dat<-data.frame(Number = 1:6, University = c("A","B","C","D","E","F"),
Service=c("Satisfied","Item skipped",NA, "Not satisfied", "Neither","Item skipped" ),
Efficiency =c("Neither", "Neither", "Item skipped","Satisfied", NA, NA),
Flavour =c("Satisfied", NA, "Item skipped",
"Neither", NA, NA), Quality =c("Not satisfied", "Neither", NA,"Satisfied", NA, NA),
Experience =c("Satisfied", NA, NA,
"Not satisfied", NA, NA),Age =rep(c(18:19), times =3))发布于 2019-03-06 06:12:35
在基本R中,我们可以使用rowSums删除cols中存在“项跳过”或NA的行
cols = c("Service", "Efficiency", "Flavour")
dat[rowSums(dat[cols] == "Item skipped" | is.na(dat[cols])) == 0, ]
# Number University Service Efficiency Flavour Quality Experience Age
#1 1 A Satisfied Neither Satisfied Not satisfied Satisfied 18
#4 4 D Not satisfied Satisfied Neither Satisfied Not satisfied 19@amrrs建议的使用apply的另一种方法
dat[!apply(dat[cols], 1, function(x) any(x == 'Item skipped' | is.na(x))), ]发布于 2019-03-06 05:59:06
编辑::使用我们可以使用的更新数据(假设NA总是与"Item_Skipped“一起出现-情况似乎是这样):
dat %>%
filter(!is.na(Experience))
Number University Service Efficiency Flavour Quality Experience Age
1 1 A Satisfied Neither Satisfied Not satisfied Satisfied 18
2 4 D Not satisfied Satisfied Neither Satisfied Not satisfied 19原件::
我们可以使用(数据见下文注):
dat %>%
filter_at(vars(contains("rating")),all_vars(.!="Item Skipped"))OR::
dat %>%
filter_all(all_vars(.!="Item Skipped"))输出:
Number University Service_rating Efficiency_rating Flavour_rating Age
1 1 A Satisfied Neither Satisfied 18
2 4 D Not satisfied Satisfied Neither 19注意事项
dat<-data.frame(Number = 1:6, University = c("A","B","C","D","E","F"),
Service_rating=c("Satisfied","Item skipped",NA, "Not satisfied", "Neither","Item skipped" ),
Efficiency_rating =c("Neither", "Neither", "Item skipped","Satisfied", NA, NA),
Flavour_rating =c("Satisfied", NA, "Item skipped",
"Neither", NA, NA), Age =rep(c(18:19), times =3))https://stackoverflow.com/questions/55016412
复制相似问题