首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何处理熊猫数据中的列

如何处理熊猫数据中的列
EN

Stack Overflow用户
提问于 2022-10-28 09:21:29
回答 2查看 37关注 0票数 -1

我正在编写一个python程序来计算一组观测频率和预期频率的卡方值。我构建的程序如下所示

代码语言:javascript
复制
# Author: Evan Gertis
# Date  : 10/25
# program : quantile decile calculator
import csv
import pandas as pd
import numpy as np 
from scipy.stats import chi2_contingency

import seaborn as sns
import matplotlib.pyplot as plt
import logging 
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')

# Step 1: read csv
dicerollsCSV       = open('dice_rolls.csv')
df      = pd.read_csv(dicerollsCSV) 
logging.debug(df['Observed'])
logging.debug(df['Expected'])


# Step 2: Convert the data into a contingency table
logging.debug('Step 2: Convert the data into a contingency tables')
# Compute a simple cross tabulation of two (or more) factors. By default computes a frequency table of the factors unless an array of values and an aggregation function are passed.
# Implement steps from: https://predictivehacks.com/how-to-run-chi-square-test-in-python/
contingency = pd.crosstab(df['Observed'], df['Expected'])
logging.debug(f'contingency:{contingency}')

# Step 3; calculate the percentages by Observed(row)
logging.debug('Step 3; calculate the percentages by Observed(row)')
# add normalize='index'
contingency_pct = pd.crosstab(df['Observed'],df['Expected'],normalize='index')
logging.debug(f'contingency_pct:{contingency_pct}')


# Step 4; calculate the chi-square test
logging.debug('Step 4: calculate the chi-square test')
c, p, dof, expected = chi2_contingency(contingency)
# c: The test statistic
# p: The p-value of the test
# dof: Degrees of freedom
# expected: The expected frequencies, based on the marginal sums of the table
logging.debug(f'c: The statistic test  {c}')
logging.debug(f'p: The p-value of the test {p}')
logging.debug(f'dof: Degrees of freedom {dof}')
logging.debug(f'expected: The expected frequencies, based on the marginal sums of the table {expected}')

我正在使用https://predictivehacks.com/how-to-run-chi-square-test-in-python/作为完成此任务的指南。我使用的特定数据集是

代码语言:javascript
复制
Observed, Expected
15, 13.9
35, 27.8
49, 41.7
58, 55.6
65, 69.5
76, 83.4
72, 69.5
60, 55.6
35, 41.7
29, 27.8
6, 13.9

期望值:观测频率和预期频率的卡方值.P值应该是0.411.

实际

代码语言:javascript
复制
2022-10-31 06:57:07,338 - DEBUG - c: The statistic test  49.499999999999986
2022-10-31 06:57:07,338 - DEBUG - p: The p-value of the test 0.2983423936107591
2022-10-31 06:57:07,338 - DEBUG - dof: Degrees of freedom 45
2022-10-31 06:57:07,339 - DEBUG - expected: The expected frequencies, based on the marginal sums of the table [[0.18181818 0.18181818 0.18181818 0.18181818 0.18181818 0.09090909]

接下来我能试试什么?

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2022-10-28 09:26:15

我相信您的DF不包含“预期”列。

您可以使用下面的代码来测试它。

代码语言:javascript
复制
import pandas as pd
df = pd.DataFrame(columns = ['a','b'], data=[[1,2],[2,2]])
df['Expected']

您可以观察到错误与您的错误相同。

票数 0
EN

Stack Overflow用户

发布于 2022-10-28 09:43:26

Expected列名在开头有一个空格,所以使用df[' Expected']或更正您的csv。还可以通过给路径Ex: pd.read_csv('./test.csv')来将csv读入熊猫df,如果您想查看列名,请运行df.columns

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/74233204

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档