文章/答案/技术大牛

发布

社区首页 >问答首页 >R中的归责

问R中的归责
EN

Stack Overflow用户

提问于 2012-10-29 01:18:56

回答 3查看 21K关注 0票数 5

我是R程序设计语言的新手。我只想知道是否有任何方法来计算的空值--只有一个列在我们的数据集中。因为我已经看到了所有的计算命令和库，所以将整个数据集的空值计算出来。

imputation

回答 3

Stack Overflow用户

回答已采纳

发布于 2012-10-29 01:30:28

下面是一个使用Hmisc包和impute的示例

library(Hmisc)
DF <- data.frame(age = c(10, 20, NA, 40), sex = c('male','female'))

# impute with mean value

DF$imputed_age <- with(DF, impute(age, mean))

# impute with random value
DF$imputed_age2 <- with(DF, impute(age, 'random'))

# impute with the media
with(DF, impute(age, median))
# impute with the minimum
with(DF, impute(age, min))

# impute with the maximum
with(DF, impute(age, max))


# and if you are sufficiently foolish
# impute with number 7 
with(DF, impute(age, 7))

 # impute with letter 'a'
with(DF, impute(age, 'a'))

有关如何实现计算的详细信息，请参阅?impute。

票数 14

Stack Overflow用户

发布于 2017-03-13 15:44:30

为什么不使用更复杂的计算算法，如小鼠(由链式方程进行多重计算)？下面是R中的一个代码片段，您可以适应您的情况。

library(mice)

#get the nhanes dataset
dat <- mice::nhanes

#impute it with mice
imp <- mice(mice::nhanes, m = 3, print=F)

imputed_dataset_1<-complete(imp,1)

head(imputed_dataset_1)

#     age  bmi hyp chl
# 1   1   22.5   1 118
# 2   2   22.7   1 187
# 3   1   30.1   1 187
# 4   3   24.9   1 186
# 5   1   20.4   1 113
# 6   3   20.4   1 184

#Now, let's see what methods have been used to impute each column
meth<-imp$method
#  age   bmi   hyp   chl
#"" "pmm" "pmm" "pmm"

#The age column is complete, so, it won't be imputed
# Columns bmi, hyp and chl are going to be imputed with pmm (predictive mean matching)

#Let's say that we want to impute only the "hyp" column
#So, we set the methods for the bmi and chl column to ""
meth[c(2,4)]<-""
#age   bmi   hyp   chl 
#""    "" "pmm"    "" 

#Let's run the mice imputation again, this time setting the methods parameter to our modified method
imp <- mice(mice::nhanes, m = 3, print=F, method = meth)

partly_imputed_dataset_1 <- complete(imp, 3)

head(partly_imputed_dataset_1)

#    age  bmi hyp chl
# 1   1   NA   1  NA
# 2   2 22.7   1 187
# 3   1   NA   1 187
# 4   3   NA   2  NA
# 5   1 20.4   1 113
# 6   3   NA   2 184

票数 2

Stack Overflow用户

发布于 2016-11-11 00:35:05

有很多包可以为你做到这一点。(关于数据的更多信息可以帮助你提出最好的选择)

一个例子可以是使用VIM包。

它有一个名为kNN (k最近邻计算)的函数，这个函数有一个选项变量，您可以在这里指定要计算哪些变量。

下面是一个示例：

library("VIM")
kNN(sleep, variable = c("NonD","Gest"))

我在本例中使用的睡眠数据集与VIM一起出现。

如果您的列中存在一些时间依赖关系，那么使用时间序列计算包也是有意义的。在这种情况下，您可以使用例如imputeTS包。下面是一个示例：

  library(imputeTS)
  na_kalman(tsAirgap)

这里用作示例的tsAirgap数据集也与imputeTS一起出现。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/13114812

复制

相似问题

问R中的归责
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R中的归责EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问R中的归责
EN