我试着在运行下面的代码后只过滤NA。但是我不能制定一个同样的代码。下面的代码在一个函数中,其中Raw_data最终根据公司划分为不同的数据帧。
Raw_data<-within(Raw_data,{
Neg_outliers=NA
Pos_outliers=NA
Neg_outliers[Company == "2002"]<-ifelse(Paid_age<mean_two_sd_negative,"negative_outlier","NA")
Neg_outliers[Company == "2203"]<-ifelse(Paid_age<mean_two_sd_negative,"negative_outlier","NA")
Neg_outliers[Company == "1804"]<-ifelse(Paid_age<mean_two_sd_negative,"negative_outlier","NA")
Neg_outliers[Company == "2401A"]<-ifelse(Paid_age<mean_two_sd_negative,"negative_outlier","NA")
Neg_outliers[Company == "2401B"]<-ifelse(Paid_age<mean_two_sd_negative,"negative_outlier","NA")
Pos_outliers[Company == "2002"]<-ifelse(Paid_age>mean_two_sd_positive,"positive_outlier","NA")
Pos_outliers[Company == "2203"]<-ifelse(Paid_age>mean_two_sd_positive,"positive_outlier","NA")
Pos_outliers[Company == "1804"]<-ifelse(Paid_age>mean_two_sd_positive,"positive_outlier","NA")
Pos_outliers[Company == "2401A"]<-ifelse(Paid_age>mean_two_sd_positive,"positive_outlier","NA")
Pos_outliers[Company == "2401B"]<-ifelse(Paid_age>mean_two_sd_positive,"positive_outlier","NA")
})发布于 2020-05-15 17:01:22
这里不需要ifelse语句,因为新列中已经有了NA。正如@jay.sf指出的那样,在任何情况下,您都会将字符串"NA"而不是NA写入新列,这可能不是您想要的。
与其使用字符串或NA来指示值是否是异常值,为什么不使用TRUE和FALSE组成的逻辑列呢?这将使以后更容易过滤出您的数据。
您的整个代码可以替换为以下代码,它更易于阅读、理解和维护:
companies <- c("2202", "2203", "1804", "2401A", "2401B")
matching <- Raw_data$Company %in% companies
Raw_data <- within(Raw_data, {
Pos_outliers <- matching & Paid_age > mean_two_sd_positive
Neg_outliers <- matching & Paid_age < mean_two_sd_positive
})因此,如果要添加新公司,只需向companies添加单个值,而不是一整行代码;如果要更改正在测试的变量,只需更改一个或两个变量名,而不是十个。
https://stackoverflow.com/questions/61814875
复制相似问题