首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何用numpy计算统计量"t-test“

如何用numpy计算统计量"t-test“
EN

Stack Overflow用户
提问于 2010-02-24 15:57:10
回答 3查看 60.8K关注 0票数 27

我希望生成一些关于我用python创建的模型的统计数据。我想在上面生成t-test,但是我想知道是否有一种简单的方法可以用numpy/scipy来做这件事。周围有什么好的解释吗?

例如,我有三个相关的数据集,如下所示:

代码语言:javascript
复制
[55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0]

现在,我想对它们进行学生的t检验。

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2010-02-24 16:10:17

scipy.stats包中,ttest_...函数很少。请参阅来自here的示例

代码语言:javascript
复制
>>> print 't-statistic = %6.3f pvalue = %6.4f' %  stats.ttest_1samp(x, m)
t-statistic =  0.391 pvalue = 0.6955
票数 29
EN

Stack Overflow用户

发布于 2017-06-26 23:27:32

van使用scipy的答案是完全正确的,并且使用scipy.stats.ttest_*函数非常方便。

但是我来到这个页面寻找一个纯numpy的解决方案,如标题中所述,以避免对scipy的依赖。为此,让我指出这里给出的示例:https://docs.scipy.org/doc/numpy/reference/generated/numpy.random.standard_t.html

主要的问题是,numpy没有累积分布函数,因此我的结论是你真的应该使用scipy。无论如何,只使用numpy是可能的:

从最初的问题中,我猜你想要比较你的数据集,并用t检验来判断是否存在显着偏差?此外,样本是配对的吗?(参见https://en.wikipedia.org/wiki/Student%27s_t-test#Unpaired_and_paired_two-sample_t-tests )在这种情况下,您可以这样计算t值和p值:

代码语言:javascript
复制
import numpy as np
sample1 = np.array([55.0, 55.0, 47.0, 47.0, 55.0, 55.0, 55.0, 63.0])
sample2 = np.array([54.0, 56.0, 48.0, 46.0, 56.0, 56.0, 55.0, 62.0])
# paired sample -> the difference has mean 0
difference = sample1 - sample2
# the t-value is easily computed with numpy
t = (np.mean(difference))/(difference.std(ddof=1)/np.sqrt(len(difference)))
# unfortunately, numpy does not have a build in CDF
# here is a ridiculous work-around integrating by sampling
s = np.random.standard_t(len(difference), size=100000)
p = np.sum(s<t) / float(len(s))
# using a two-sided test
print("There is a {} % probability that the paired samples stem from distributions with the same means.".format(2 * min(p, 1 - p) * 100))

这将输出There is a 73.028 % probability that the paired samples stem from distributions with the same means.,因为这远远高于任何合理的置信区间(比方说5%),所以您不应该针对具体情况得出任何结论。

票数 8
EN

Stack Overflow用户

发布于 2013-01-09 10:15:08

一旦你得到了t值,你可能想知道如何将其解释为概率--我就是这么做的。这是我写的一个函数来帮助你做到这一点。

它是基于我从http://www.vassarstats.net/rsig.htmlhttp://en.wikipedia.org/wiki/Student%27s_t_distribution收集的信息。

代码语言:javascript
复制
# Given (possibly random) variables, X and Y, and a correlation direction,
# returns:
#  (r, p),
# where r is the Pearson correlation coefficient, and p is the probability
# of getting the observed values if there is actually no correlation in the given
# direction.
#
# direction:
#  if positive, p is the probability of getting the observed result when there is no
#     positive correlation in the normally distributed full populations sampled by X
#     and Y
#  if negative, p is the probability of getting the observed result, when there is no
#     negative correlation
#  if 0, p is the probability of getting your result, if your hypothesis is true that
#    there is no correlation in either direction
def probabilityOfResult(X, Y, direction=0):
    x = len(X)
    if x != len(Y):
        raise ValueError("variables not same len: " + str(x) + ", and " + \
                         str(len(Y)))
    if x < 6:
        raise ValueError("must have at least 6 samples, but have " + str(x))
    (corr, prb_2_tail) = stats.pearsonr(X, Y)

    if not direction:
        return (corr, prb_2_tail)

    prb_1_tail = prb_2_tail / 2
    if corr * direction > 0:
        return (corr, prb_1_tail)

    return (corr, 1 - prb_1_tail)
票数 -4
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/2324438

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档