我有一个有家庭作业和考试成绩的学生的数据框架。
> student1
UID Homework_1 Homework_2 Homework_3 Homework_4 Homework_5 Homework_6 Homework_7 Homework_8
10 582493224 59 99 88 10 66 90 50 80
Homework_9 Homework_10 Exam_1 Exam_2 Exam_3 Section
10 16 NA 41 61 11 AHomework_10评分缺失了,我需要创建一个函数来计算NA值的平均值或中位数。
函数messy_impute应该有以下参数:
数据:数据帧或数据提示。
中心:是用平均值还是中间值来计算。
边距:是使用行还是列输入值(1-使用第2行-使用列)。
例如,
messy_impute(student1,mean,1) should print out
> student1
UID Homework_1 Homework_2 Homework_3 Homework_4 Homework_5 Homework_6 Homework_7 Homework_8
10 582493224 59 99 88 10 66 90 50 80
Homework_9 Homework_10 Exam_1 Exam_2 Exam_3 Section
10 16 **62** 41 61 11 A
since the mean of the rest of the homework is 62.如果A节中作业10的专栏(其他学生)的平均值是50,那么
messy_impute(student1,mean,2) should print out
> student1
UID Homework_1 Homework_2 Homework_3 Homework_4 Homework_5 Homework_6 Homework_7 Homework_8
10 582493224 59 99 88 10 66 90 50 80
Homework_9 Homework_10 Exam_1 Exam_2 Exam_3 Section
10 16 **50** 41 61 11 A由于A节中各栏的平均数为50。
请注意,如果边距为2,则计算应在同一区段进行。
我真的被困在这个定义函数上了。
发布于 2020-08-02 14:59:53
R基解决方案:
# Define function to Impute a row-wise mean (assumes one observation per student):
row_wise_mean_impute <- function(df){
grade_df <- df[,names(df) != "studid"]
return(cbind(df[,c("studid"), drop = FALSE],
replace(grade_df, is.na(grade_df), apply(grade_df, 1, mean, na.rm = TRUE))))
}
# Apply function:
row_wise_mean_impute(student1)数据:
x <- c(rnorm(85, 50, 3), rnorm(15, 50, 15))
student1 <- cbind(studid = 1010101, data.frame(t(x)))
student1[, 10] <- NA_real_ https://stackoverflow.com/questions/63214272
复制相似问题