文章/答案/技术大牛

发布

社区首页 >问答首页 >Pandas中多个特征的卡方检验

问Pandas中多个特征的卡方检验
EN

Stack Overflow用户

提问于 2019-12-04 01:05:26

回答 1查看 1.8K关注 0票数 1

我有一个示例数据帧，如下所示

m_list = ['male','male','female','female']
whiskey_list = ['alcohol','no_alcohol','alcohol','no_alcohol']
f1 = [273,62,60,7]
f2 = [276,61,57,8]
l = [m_list,whiskey_list,f1,f2]
test_df = pd.DataFrame(l).T
test_df.columns = ['gender','drink_category','f1','f2']


    gender  drink_category  f1  f2
0   male    alcohol         273 276
1   male    no_alcohol      62  61
2   female  alcohol         60  57
3   female  no_alcohol      7   8

我想使用卡方检验来看看gender和drink_category这两个类别之间是否存在任何关系。出于这些目的，我希望为每个特性构建一个从f1,f2....fn开始的联想表，然后计算每个特性的p-values。

这里的例子只有两个特性，f1和f2，但通常我有很多特性。

当我处理f1时，我的列联表将如下所示-

gender   alcohol   no_alcohol
male      273        62
female    60         7

然后我会计算f1的p值。

当我处理f2时，我的列联表将如下所示-

gender   alcohol   no_alcohol
male      276        61
female    57         8

如何使用pandas和scipy库进行计算？

最后，我想要一个数据帧，其中我有每个特性f1到fn的p值。

python

pandas

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-12-04 04:17:10

我们可以使用scipy.stat的chi2_contingency来获得用pandas的pivot函数构建的偶联表的p值。

import pandas as pd
from scipy.stats import chi2_contingency

test_df = pd.DataFrame({'gender': ['male','male','female','female'],
                        'drink_category': ['alcohol','no_alcohol','alcohol','no_alcohol'],
                        'f1': [273,62,60,7],
                        'f2': [276,61,57,8]})

p = pd.Series()
for feature in [c for c in test_df.columns if c.startswith('f')]:
   _,p[feature],_,_ = chi2_contingency(test_df.pivot('gender','drink_category',feature))

print(p)

输出：

f1    0.155699
f2    0.339842
dtype: float64

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59162071

复制

相似问题

问Pandas中多个特征的卡方检验
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pandas中多个特征的卡方检验EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Pandas中多个特征的卡方检验
EN