我正在尝试使用R对泊松进行一系列观察的拟合优度测试。我在计算每分钟有多少人在57分钟内做某件事。我从未得到任何大于13的观察值,我获得了以下数据:(对于0到13+人员的情况):
observed = c(3/57, 4/57, 9/57, 7/57, 9/57, 8/57, 2/57, 3/57, 7/57, 2/57, 1/57, 0, 1/57, 1/57, 0)这意味着我观察了3次0人,4次1人,9次2人,依此类推(最后的0表示我从未见过14个或更多的人)。
mn = 4.578947
cases = c(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13)
estimated = c()
for (i in cases)(estimated <- c(estimated, dpois(i, lambda = mn)))
estimated <- c(estimated, (1-ppois(13, lambda=mn)))其中mn是从数据中获得的平均值。最后,我运行
chisq.test(observed, p=estimated)我得到了:
Chi-squared test for given probabilities
data: observed
X-squared = 1.0182, df = 14, p-value = 1
Warning message:
In chisq.test(observed, p = estimated) :
Chi-squared approximation may be incorrect我不太精通这个领域(既不是统计数据,也不是R编程),但我认为我不应该得到恰好为1.0的p值。我做错了什么?(顺便说一句:我的代码很可能对我要做的事情不是最优的,但我很少使用R,而且它现在也不是我工作的重点。)
发布于 2018-06-18 12:41:52
你观察到的值应该是计数,而不是比例:
> chisq.test(observed*57, p=estimated)
Chi-squared test for given probabilities
data: observed * 57
X-squared = 58.036, df = 14, p-value = 2.585e-07根据chisq.test的R帮助文件
如果x是一个只有一行或一列的矩阵,或者如果x是一个向量而y没有给定,则执行拟合优度测试(x被视为一维列联表)。x的条目必须是非负整数。
(强调我的)
您可以使用手册中的一些示例代码来测试这一点
应该怎么做:
> x <- c(89,37,30,28,2)
> p <- c(0.40,0.20,0.20,0.19,0.01)
> chisq.test(x, p = p)
Chi-squared test for given probabilities
data: x
X-squared = 5.7947, df = 4, p-value = 0.215
Warning message:
In chisq.test(x, p = p) : Chi-squared approximation may be incorrect犯了和你一样的错误:
> chisq.test(x/sum(x), p = p)
Chi-squared test for given probabilities
data: x/186
X-squared = 0.031154, df = 4, p-value = 0.9999
Warning message:
In chisq.test(x/186, p = p) : Chi-squared approximation may be incorrect发布于 2018-06-18 13:35:20
首先,为了进行拟合优度测试,需要观察到的频率和bin概率。
observed = c(3, 4, 9, 7, 9, 8, 2, 3, 7, 2, 1, 0, 1, 1, 0) # keep counts概率是正确的:
mn = 4.578947
prob = c()
for (i in cases) (prob <- c(prob, dpois(i, lambda = mn)))
prob <- c(prob, (1-ppois(13, lambda=mn))) # prob for 13 and plus category最重要的是,在bin/类别中的预期频率应该至少为5。Chisq-test对小样本无效。这就是为什么您会收到警告(请参阅类别1、2和8-15的预期频率):
poisson_df <- data.frame(observed, prob)
poisson_df$expected = sum(poisson_df$observed)*poisson_df$prob
poisson_df
# observed prob expected
#1 3 0.0102657004 0.58514492
#2 4 0.0470060980 2.67934759
#3 9 0.1076192157 6.13429530
#4 7 0.1642608950 9.36287101
#5 9 0.1880354831 10.71802253
#6 8 0.1722009022 9.81545143
#7 2 0.1314164674 7.49073864
#8 3 0.0859641485 4.89995646
#9 7 0.0492031600 2.80458012
#10 2 0.0250331846 1.42689152
#11 1 0.0114625626 0.65336607
#12 0 0.0047714970 0.27197533
#13 1 0.0018207026 0.10378005
#14 1 0.0006413001 0.03655410
#15 0 0.0002986829 0.01702492
chisq.test(x = poisson_df$observed, p= poisson_df$prob)
# Chi-squared test for given probabilities
# data: observed
# X-squared = 58.036, df = 14, p-value = 2.585e-07
Warning message:
In chisq.test(x = poisson_df$observed, p= poisson_df$prob) :
Chi-squared approximation may be incorrect因此,您需要适当地创建bin。需要注意的是,Chisq-对bin非常敏感,bin的一种方式如下:
cat_eq_3_less <- apply(poisson_df[1:3,], 2 , sum) # sum of 1 to 3 categories
cat_eq_8_plus <- apply(poisson_df[8:15,], 2 , sum) # sum 8 to 15 categories
corrected_df <- rbind(cat_eq_3_less, poisson_df[4:7,], cat_eq_8_plus)
corrected_df
# observed prob expected
# 16 0.1648910 9.398788
# 7 0.1642609 9.362871
# 9 0.1880355 10.718023
# 8 0.1722009 9.815451
# 2 0.1314165 7.490739
# 15 0.1791952 10.214129
chisq.test(x = corrected_df$observed, p = corrected_df$prob)
Chi-squared test for given probabilities
data: corrected_df$observed
X-squared = 12.111, df = 5, p-value = 0.0333https://stackoverflow.com/questions/50902341
复制相似问题