我正在尝试使用列联表在python中计算X平方值。下面是一个例子。
+--------+------+------+
| | Cat1 | Cat2 |
+--------+------+------+
| Group1 | 80 | 120 |
| Group2 | 420 | 380 |
+--------+------+------+期望值为:
+--------+------+------+
| | Cat1 | Cat2 |
+--------+------+------+
| Group1 | 100 | 100 |
| Group2 | 400 | 400 |
+--------+------+------+如果我手工计算卡方的值,我会得到10。而使用python,我会得到9.506。我使用以下代码:
import numpy as np
import pandas as pd
from scipy.stats import chi2_contingency
import scipy
# Some fake data.
n = 5 # Number of samples.
d = 3 # Dimensionality.
c = 2 # Number of categories.
data = np.random.randint(c, size=(n, d))
data = pd.DataFrame(data, columns=['CAT1', 'CAT2', 'CAT3'])
# Contingency table.
contingency = pd.crosstab(data['CAT1'], data['CAT2'])
contingency.iloc[0][0]=80
contingency.iloc[0][1]=120
contingency.iloc[1][0]=420
contingency.iloc[1][1]=380
# Chi-square test of independence.
chi, p, dof, expected = chi2_contingency(contingency)奇怪的是,这个函数给了我正确的期望值,但是卡方和p值却是关闭的。我在这里做错了什么?
谢谢
附注:
我知道我在pandas中创建初始表非常差劲,但我不是如何在pandas中创建这些嵌套表的专家。
发布于 2017-08-03 22:24:50
从文档中:
correction : bool, optional
If True, and the degrees of freedom is 1, apply Yates’ correction for continuity.
The effect of the correction is to adjust each observed value by 0.5 towards
the corresponding expected value.自由度是1,如果你把校正设为False,你会得到10。
chi2_contingency(contingency, correction=False)
>>> (10.0, 0.001565402258002549, 1, array([[ 100., 100.],
[ 400., 400.]]))https://stackoverflow.com/questions/45486926
复制相似问题