首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在BigQuery中运行成对的t测试。是否有可用的自定义项?

如何在BigQuery中运行成对的t测试。是否有可用的自定义项?
EN

Stack Overflow用户
提问于 2018-04-20 13:21:41
回答 1查看 2.7K关注 0票数 5

我试图使用JavaScript statistical.js库在bigquery中运行配对t测试,但得到的数字不一致。

我的代码如下;

代码语言:javascript
复制
 CREATE TEMPORARY FUNCTION ttest(a ARRAY<FLOAT64>, b ARRAY<FLOAT64>)
 RETURNS FLOAT64
 LANGUAGE js AS
"""
   return statistical.methods.tTestTwoSample(a, b); 
   //statistical.methods.tTestTwoSample(sample1: Array, sample2: Array) 
"""
OPTIONS (
library="gs://my_bucket/statistical.js"
);


SELECT 
 ttest([9.96,   3.76,   1.17,   8.66,   5.25,   7.61,   5.8,    1.84,   7.06, 9.4,  2.99,   9.3,    9.01,   4.24,   3.52,   9.6,    7.59,   6.99,   9.62,   2.18] , 
    [3.96,  5.76,   7.17,   7.66,   9.25,   3.61,   4.8,    8.84,   6.06,   4.4,    1.99,   8.3,    9.01,   1.24,   5.52,   8.6,    5.59,   1.99,   7.62,   3.18]  ) ttest 

并将gs://my_bucket/statistical.js作为JS包在google存储中的保存位置。

该程序包可在此处https://github.com/adrien2p/statistical-js获得

两个样本t- statistical.methods.tTestTwoSample(sample1: Array, sample2: Array)应该返回一个p值。我无法找出测试是否配对。当我运行代码时,我得到了4.099,据我所知,这不是一个p值。我已经在Excel中运行了测试,以使用=T.TEST()验证结果,在成对的双边测试中,我得到的p值为0.4871。我还尝试了除配对之外的其他t检验,并尝试将4.099结果视为T统计量,但没有任何数字匹配的幸运。

所以我的问题是,如何在BigQuery中运行成对的t-test,并得到一个p值?我应该使用其他包吗?

提前感谢您的帮助。

EN

回答 1

Stack Overflow用户

发布于 2018-04-20 15:10:19

在没有太多工作的情况下,我决定手动计算t统计量,并使用jStat库(在github.com/jstat/jstat上可以找到)将统计量转换为p值。

我的代码如下;

代码语言:javascript
复制
CREATE TEMPORARY FUNCTION tscore_to_p(a FLOAT64, b FLOAT64, c FLOAT64)
 RETURNS FLOAT64
 LANGUAGE js AS
"""
  return jStat.ttest(a,b,c); //jStat.ttest( tscore, n, sides)
"""
OPTIONS (
 library="gs://my_bucket/jstat.min.js"
);


WITH data AS ( 
 SELECT * FROM 
 (SELECT 9.96 AS A, 3.96 AS B) UNION ALL
 (SELECT 3.76 AS A, 5.76 AS B) UNION ALL
 (SELECT 1.17 AS A, 7.17 AS B) UNION ALL
 (SELECT 8.66 AS A, 7.66 AS B) UNION ALL
 (SELECT 5.25 AS A, 9.25 AS B) UNION ALL
 (SELECT 7.61 AS A, 3.61 AS B) UNION ALL
 (SELECT 5.80 AS A, 4.80 AS B) UNION ALL
 (SELECT 1.84 AS A, 8.84 AS B) UNION ALL
 (SELECT 7.06 AS A, 6.06 AS B) UNION ALL
 (SELECT 9.40 AS A, 4.40 AS B) UNION ALL
 (SELECT 2.99 AS A, 1.99 AS B) UNION ALL
 (SELECT 9.30 AS A, 8.30 AS B) UNION ALL
 (SELECT 9.01 AS A, 9.01 AS B) UNION ALL
 (SELECT 4.24 AS A, 1.24 AS B) UNION ALL
 (SELECT 3.52 AS A, 5.52 AS B) UNION ALL
 (SELECT 9.60 AS A, 8.60 AS B) UNION ALL
 (SELECT 7.59 AS A, 5.59 AS B) UNION ALL
 (SELECT 6.99 AS A, 1.99 AS B) UNION ALL
 (SELECT 9.62 AS A, 7.62 AS B) UNION ALL
 (SELECT 2.18 AS A, 3.18 AS B)
 ) 


SELECT 
   COUNT(*) n
 , COUNT(*)-1 dof
 , AVG(difference) mean
 , STDDEV_SAMP(difference) SD 
 , STDDEV_SAMP(difference)/SQRT(COUNT(*)) SE
 , AVG(difference)/ (STDDEV_SAMP(difference)/SQRT(COUNT(*))) t 
 , tscore_to_p((AVG(difference)/ (STDDEV_SAMP(difference)/SQRT(COUNT(*)))) COUNT(*), 2) p_value 

FROM 
  (SELECT *, (A-B) difference FROM data)

这将导致p值为0.4871,与Excel中的配对t-test运行相同。

感谢所有考虑过这个问题的人。

票数 9
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/49934521

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档