我试图使用JavaScript statistical.js库在bigquery中运行配对t测试,但得到的数字不一致。
我的代码如下;
CREATE TEMPORARY FUNCTION ttest(a ARRAY<FLOAT64>, b ARRAY<FLOAT64>)
RETURNS FLOAT64
LANGUAGE js AS
"""
return statistical.methods.tTestTwoSample(a, b);
//statistical.methods.tTestTwoSample(sample1: Array, sample2: Array)
"""
OPTIONS (
library="gs://my_bucket/statistical.js"
);
SELECT
ttest([9.96, 3.76, 1.17, 8.66, 5.25, 7.61, 5.8, 1.84, 7.06, 9.4, 2.99, 9.3, 9.01, 4.24, 3.52, 9.6, 7.59, 6.99, 9.62, 2.18] ,
[3.96, 5.76, 7.17, 7.66, 9.25, 3.61, 4.8, 8.84, 6.06, 4.4, 1.99, 8.3, 9.01, 1.24, 5.52, 8.6, 5.59, 1.99, 7.62, 3.18] ) ttest 并将gs://my_bucket/statistical.js作为JS包在google存储中的保存位置。
该程序包可在此处https://github.com/adrien2p/statistical-js获得
两个样本t- statistical.methods.tTestTwoSample(sample1: Array, sample2: Array)应该返回一个p值。我无法找出测试是否配对。当我运行代码时,我得到了4.099,据我所知,这不是一个p值。我已经在Excel中运行了测试,以使用=T.TEST()验证结果,在成对的双边测试中,我得到的p值为0.4871。我还尝试了除配对之外的其他t检验,并尝试将4.099结果视为T统计量,但没有任何数字匹配的幸运。
所以我的问题是,如何在BigQuery中运行成对的t-test,并得到一个p值?我应该使用其他包吗?
提前感谢您的帮助。
发布于 2018-04-20 15:10:19
在没有太多工作的情况下,我决定手动计算t统计量,并使用jStat库(在github.com/jstat/jstat上可以找到)将统计量转换为p值。
我的代码如下;
CREATE TEMPORARY FUNCTION tscore_to_p(a FLOAT64, b FLOAT64, c FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS
"""
return jStat.ttest(a,b,c); //jStat.ttest( tscore, n, sides)
"""
OPTIONS (
library="gs://my_bucket/jstat.min.js"
);
WITH data AS (
SELECT * FROM
(SELECT 9.96 AS A, 3.96 AS B) UNION ALL
(SELECT 3.76 AS A, 5.76 AS B) UNION ALL
(SELECT 1.17 AS A, 7.17 AS B) UNION ALL
(SELECT 8.66 AS A, 7.66 AS B) UNION ALL
(SELECT 5.25 AS A, 9.25 AS B) UNION ALL
(SELECT 7.61 AS A, 3.61 AS B) UNION ALL
(SELECT 5.80 AS A, 4.80 AS B) UNION ALL
(SELECT 1.84 AS A, 8.84 AS B) UNION ALL
(SELECT 7.06 AS A, 6.06 AS B) UNION ALL
(SELECT 9.40 AS A, 4.40 AS B) UNION ALL
(SELECT 2.99 AS A, 1.99 AS B) UNION ALL
(SELECT 9.30 AS A, 8.30 AS B) UNION ALL
(SELECT 9.01 AS A, 9.01 AS B) UNION ALL
(SELECT 4.24 AS A, 1.24 AS B) UNION ALL
(SELECT 3.52 AS A, 5.52 AS B) UNION ALL
(SELECT 9.60 AS A, 8.60 AS B) UNION ALL
(SELECT 7.59 AS A, 5.59 AS B) UNION ALL
(SELECT 6.99 AS A, 1.99 AS B) UNION ALL
(SELECT 9.62 AS A, 7.62 AS B) UNION ALL
(SELECT 2.18 AS A, 3.18 AS B)
)
SELECT
COUNT(*) n
, COUNT(*)-1 dof
, AVG(difference) mean
, STDDEV_SAMP(difference) SD
, STDDEV_SAMP(difference)/SQRT(COUNT(*)) SE
, AVG(difference)/ (STDDEV_SAMP(difference)/SQRT(COUNT(*))) t
, tscore_to_p((AVG(difference)/ (STDDEV_SAMP(difference)/SQRT(COUNT(*)))) COUNT(*), 2) p_value
FROM
(SELECT *, (A-B) difference FROM data)这将导致p值为0.4871,与Excel中的配对t-test运行相同。
感谢所有考虑过这个问题的人。
https://stackoverflow.com/questions/49934521
复制相似问题