我正尝试在一个简单的数据集上执行朴素贝叶斯分类器。我拥有的三个变量是weight (连续)、BP (连续)和disease (二分式)。
当我为朴素贝叶斯编写命令时,一些结果给出的概率(远远)大于1。我也尝试过“e1071”和“klaR”。
下面是我的代码:
> install.packages("e1071")
> library(e1071)
> mydata$disease<-as.factor(mydata$disease)
> classifier<- naiveBayes(disease ~ weight + BP, mydata, laplace = 0, subset, na.action = na.pass)
> Please see my results below,
> A-priori probabilities:
> Y
> 0 1
> 0.47 0.53
> Conditional probabilities:
> weight
> Y [,1] [,2]
> 0 69.10638 27.22869
> 1 131.22642 39.47377
> BP
> Y [,1] [,2]
> 0 44.78723 21.73350
> 1 35.81132 13.55623如上所述,其中一个概率是44.78723。对吗?我也尝试过klaR,它给出了非常相似的结果。帮助?
发布于 2014-12-18 03:30:26
升级评论:
输出给出了每个级别的类变量的正态分布的参数(均值和标准差)。在?naiveBayes帮助中:
For each numeric variable, a table giving, for each target class, mean and
standard deviation of the (sub-)variable)使用iris数据集运行一个小示例来查看这一点:
library(e1071)
# load iris dataset and set some values to missing
data(iris)
iris$Sepal.Length[1] <- NA
iris$Petal.Width[2] <- NA
iris$Species[3] <- NA
# run naive Bayes model
(m <- naiveBayes(Species ~ Sepal.Length + Petal.Width , data = iris, na.action=na.omit))这将产生输出
# Naive Bayes Classifier for Discrete Predictors
#
# Call:
# naiveBayes.default(x = X, y = Y, laplace = laplace)
#
# A-priori probabilities:
# Y
# setosa versicolor virginica
# 0.3197279 0.3401361 0.3401361
#
# Conditional probabilities:
# Sepal.Length
# Y [,1] [,2]
# setosa 5.012766 0.3603241
# versicolor 5.936000 0.5161711
# virginica 6.588000 0.6358796
#
# Petal.Width
# Y [,1] [,2]
# setosa 0.2489362 0.1080908
# versicolor 1.3260000 0.1977527
# virginica 2.0260000 0.2746501检查表格是否给出了平均值和st。每个Species级别的每个连续变量的dev
aggregate(cbind(Sepal.Length, Petal.Width) ~ Species, data=iris,
function(i) c(mean(i), sd(i)))
# Species Sepal.Length.1 Sepal.Length.2 Petal.Width.1 Petal.Width.2
# 1 setosa 5.0127660 0.3603241 0.2489362 0.1080908
# 2 versicolor 5.9360000 0.5161711 1.3260000 0.1977527
# 3 virginica 6.5880000 0.6358796 2.0260000 0.2746501https://stackoverflow.com/questions/27516917
复制相似问题