文章/答案/技术大牛

发布

社区首页 >问答首页 >R小鼠提出新的观察

问R小鼠提出新的观察
EN

Stack Overflow用户

提问于 2016-10-18 18:13:44

回答 1查看 1.1K关注 0票数 2

当我使用mice包来计算数据时，我有以下问题：

我似乎找不到一种方法来替换新观测的NA值，因为我已经在训练集中估算了丢失的数据。

示例1

我用数据帧的数据训练了一个算法，它有10个特征和1000个观测值。

如何使用这个算法(有丢失的数据)来预测一个新的观测？

示例2

假设我们有一个具有NA值的数据框架：

V1   V2  V3  R1
1    2   NA  1
1.4  -1  0   0
1.2  NA  0   1
1.6  NA  1   1
1.2  3   1   0

我使用mice包计算缺少的值：

imp <- mice(df, m = 2, maxit = 100, meth = 'pmmm', seed = 12345)

对象df现在有两个带有计算值的数据格式。

(dfImp1)
V1   V2  V3  R1
1    2   0.5 1
1.4  -1  0   0
1.2  1.5 0   1
1.6  1.5 1   1
1.2  3   1   0

现在，有了这个数据框架，我可以训练一个算法：

modl <- glm(R1~., (dfImp1), family = binomial)

我想预测一个新的观察结果，例如：

obs1 <- data.frame(V1 = 1, V2 = 1.4, V3 = NA)

如何计算新个体观测中丢失的数据？

machine-learning

missing-data

imputation

r-mice

回答 1

Stack Overflow用户

发布于 2020-12-28 12:17:01

似乎鼠标包没有内置的解决方案，但我们可以编写一个。

其想法是：

(1)在训练GLM的数据集和新的观测数据中，使用相同的小鼠算法填充NA；
(2)仅预测无NA的新观测。

我要用虹膜作为数据例子。

library(R6)
library(mice)

# Binary output to use Binomial
df <- iris %>% filter(Species != "virginica")

# The new observation 
new_data <- tail(df, 1)

# the dataset used to train the model
df <- head(df,-1)

# Now, let insert some NAs
insert_nas <- function(x) {
  set.seed(123)
  len <- length(x)
  n <- sample(1:floor(0.2*len), 1)
  i <- sample(1:len, n)
  x[i] <- NA 
  x
}

df$Sepal.Length <- insert_nas(df$Sepal.Length)

df$Petal.Width <- insert_nas(df$Petal.Width)

new_data$Sepal.Width = NA

summary(df)

在拟合方法中，我们使用小鼠填充NAs，拟合GLM模型并将其存储在预测方法中。

在预测方法中，我们(1)将new_observation添加到数据集中(用NAs)，(2)再次使用鼠标替换NA，(3)返回不使用NA的新观测的行，然后(4)应用GLM来预测这一新的观测结果。

# R6 Class Generator
GLMWithMice <- R6Class("GLMWithMice", list(
  model = NULL,
  df = NULL,
  fitted = FALSE,
  
  initialize = function(df) {
    self$df <- df
  },
  fit = function(formula = "Species~.", family = binomial) {
    
    imp <- mice(self$df, m = 2, maxit = 100, meth = 'pmm', seed = 12345, print=FALSE)
    df_cleaned <- complete(imp,1)
    self$model <- glm(formula, df_cleaned, family = family, maxit = 100)
    self$fitted <- TRUE
    return(cat("\n model fitted!"))
  },
  predict = function(new_data, type = "response"){
    n_rows <- nrow(self$df)
    df_new <- bind_rows(self$df, new_data)
    imp <- mice(df_new, m = 2, maxit = 100, meth = 'pmm', seed = 12345, print=FALSE)
    df_cleaned <- complete(imp,1)
    new_data_cleaned <- tail(df_cleaned, nrow(df_new) - n_rows)
    return(predict(self$model,new_data_cleaned, type = type))
  }
  )
)

#Let's create a new instance of "GLMWithMice" class
model <- GLMWithMice$new(df = df)

class(model)

model$fit(formula = Species~., family = binomial)

model$predict(new_data = new_data)

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/40115226

复制

相似问题

问R小鼠提出新的观察
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R小鼠提出新的观察EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R小鼠提出新的观察
EN