我试图检查DF下面的性别与目的地国家列之间的关系。
id gender country_destination
2 4ft3gnwmtx FEMALE US
6 lsw9q7uk0j FEMALE US
7 0d01nltbrs FEMALE US
8 a1vcnhxeij FEMALE US
10 yuuqmid2rp FEMALE US为了应用stat.chi2_contingency和获取p值,我将这个表转换为pivot_table,如下所示:
AU CA DE ES FR GB IT NL PT US
gender
FEMALE 207 455 358 853 1962 88 1091 254 78 22694
MALE 188 477 416 677 1335 682 699 278 69 19457但是,当我执行stats.chi2_contingency(df_contingency)时,我会出现以下错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-161-fee39cdf166f> in <module>()
----> 1 stats.chi2_contingency(df_contingency)
/usr/local/lib/python3.6/dist-packages/scipy/stats/contingency.py in
chi2_contingency(observed, correction, lambda_)
243 observed = np.asarray(observed)
--> 244 if np.any(observed < 0):
245 raise ValueError("All values in `observed` must be nonnegative.")
246 if observed.size == 0:
TypeError: '<' not supported between instances of 'str' and 'int'请帮我弄清楚我哪里出了问题?
发布于 2020-06-26 22:53:37
我猜你做了这样的事:
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency
np.random.seed(111)
dat = pd.DataFrame({'gender':np.random.choice(['MALE','FEMALE'],100),
'country_destination':np.random.choice(['AU','CA','DE','ES','FR'],100)})
pt = dat.pivot_table(columns='country_destination',index='gender',
values='country_destination',aggfunc=len)
country_destination AU CA DE ES FR
gender
FEMALE 6 12 12 9 15
MALE 8 9 12 4 13这项工程的意外事故:
chi2_contingency(pt)如果你做交叉表而不是枢轴,你也会:
chi2_contingency(pd.crosstab(dat['gender'],dat['country_destination']))最有可能的情况是,您需要提供指向应急表的计数。
https://stackoverflow.com/questions/62601422
复制相似问题