首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用创建t-test表

使用创建t-test表
EN

Stack Overflow用户
提问于 2015-07-17 03:53:26
回答 3查看 3.7K关注 0票数 1

我想在两个人群之间进行t检验(在治疗组中或治疗组之外(在下面的样本数据中分别为1或0),对不同的研究,所有这些都处于相同的数据中。在下面的样本数据中,我希望为1/0治疗组之间的所有变量(样本数据:年龄、美元、DiseaseCnt)生成t检验。我想通过程序来进行这些t检验,而不是在整个人群中进行。我有生成t测试的逻辑。但是,我需要帮助从函数中提取适当的部分&创建一些易于理解的内容。

最终,我想要的是: t-stats,p-值表,t-测试所用的变量,以及测试变量的程序。

代码语言:javascript
复制
DT<-data.frame(
               Treated=sample(0:1,1000,replace=T)
              ,Program=c('Program A','Program B','Program C','Program D')
              ,Age=as.integer(rnorm(1000,mean=65,sd=15))
              ,Dollars=as.integer(rpois(1000,lambda=1000))
              ,DiseaseCnt=as.integer(rnorm(1000,mean=5,sd=2)) )

progs<-unique(DT$Program) # Pull program names
vars<-names(DT)[3:5] # pull variables to run t tests

test<-lapply(progs, function(i)
          tt<-lapply(vars, function(j) {t.test( DT[DT$Treated==1 & DT$Program == i,names(DT)==j] 
                                                ,DT[DT$Treated==0 & DT$Program == i,names(DT)==j]
                                                ,alternative = 'two.sided'  ) 
              list(j,tt$statistic,tt$p.value)  }
                 ) ) 
  # nested lapply produces results in list format that can be binded, but complete output w/ both lapply's is erroneous
EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-07-17 04:38:43

您应该首先将其转换为data.table。(在我的代码中,我将原始表称为DF):

代码语言:javascript
复制
DT <- as.data.table(DF)
DT[, t.test(data=.SD, Age ~ Treated), by=Program]
   Program  statistic parameter   p.value   conf.int estimate null.value alternative
1: Program A -0.6286875  247.8390 0.5301326 -4.8110579 65.26667          0   two.sided
2: Program A -0.6286875  247.8390 0.5301326  2.4828527 66.43077          0   two.sided
3: Program B  1.4758524  230.5380 0.1413480 -0.9069634 67.15315          0   two.sided
4: Program B  1.4758524  230.5380 0.1413480  6.3211834 64.44604          0   two.sided
5: Program C  0.1994182  246.9302 0.8420998 -3.3560930 63.56557          0   two.sided
6: Program C  0.1994182  246.9302 0.8420998  4.1122406 63.18750          0   two.sided
7: Program D -1.1321569  246.0086 0.2586708 -6.1855837 62.31707          0   two.sided
8: Program D -1.1321569  246.0086 0.2586708  1.6701237 64.57480          0   two.sided
                method      data.name
1: Welch Two Sample t-test Age by Treated
2: Welch Two Sample t-test Age by Treated
3: Welch Two Sample t-test Age by Treated
4: Welch Two Sample t-test Age by Treated
5: Welch Two Sample t-test Age by Treated
6: Welch Two Sample t-test Age by Treated
7: Welch Two Sample t-test Age by Treated
8: Welch Two Sample t-test Age by Treated

在这种格式中,对于每个Programstatistic对两者都是相同的,并且等于t,这里的parameterdf,对于conf.int,它是(按顺序排列)低于上层(所以对于Program A,置信区间是(-4.8110579, 2.4828527),对于estimate,它的置信区间是group 0,然后是group 1 (所以对于Program A ),Treated == 0的平均值是65.26667,等等。

这是我能想出的最快的解决方案,您可以遍历vars,或者有一种更简单的方法。

编辑:我只确认了Program AAge,使用以下代码:

代码语言:javascript
复制
DT[Program == 'Program A', t.test(Age ~ Treated)]
    Welch Two Sample t-test

data:  Age by Treated
t = -0.62869, df = 247.84, p-value = 0.5301
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -4.811058  2.482853
sample estimates:
mean in group 0 mean in group 1
       65.26667        66.43077

编辑2:下面是循环遍历变量和rbind的代码:

代码语言:javascript
复制
do.call(rbind, lapply(vars, function(x) DT[, t.test(data=.SD, eval(parse(text=x)) ~ Treated), by=Program]))
票数 6
EN

Stack Overflow用户

发布于 2015-07-17 04:37:58

你可以从回归中得到同样的t检验;如果你认为不同项目的治疗效果不同,你应该包括一个互动。还可以指定多个响应。

代码语言:javascript
复制
> m <- lm(cbind(Age,Dollars,DiseaseCnt)~Treated * Program - Treated - 1, DT)
> lapply(summary(m), `[[`, "coefficients")
$`Response Age`
                              Estimate  Std. Error       t value         Pr(>|t|)
ProgramProgram A         63.0875912409 1.294086510 48.7506752932 1.355786133e-265
ProgramProgram B         65.3846153846 1.400330869 46.6922616771 1.207761156e-252
ProgramProgram C         66.0695652174 1.412455172 46.7763979425 3.534894216e-253
ProgramProgram D         66.6691729323 1.313402302 50.7606640010 5.038015651e-278
Treated:ProgramProgram A  2.8593114140 1.924837595  1.4854819032  1.377339219e-01
Treated:ProgramProgram B -0.9786003470 1.919883369 -0.5097186438  6.103619649e-01
Treated:ProgramProgram C -0.5066022544 1.922108032 -0.2635659631  7.921691261e-01
Treated:ProgramProgram D -2.8657541289 1.919883369 -1.4926709484  1.358412980e-01

$`Response Dollars`
                                Estimate  Std. Error        t value     Pr(>|t|)
ProgramProgram A          998.5474452555 2.681598120 372.3702808887 0.0000000000
ProgramProgram B          997.4188034188 2.901757030 343.7292623810 0.0000000000
ProgramProgram C         1001.6869565217 2.926880936 342.2370019265 0.0000000000
ProgramProgram D         1001.2180451128 2.721624185 367.8752013053 0.0000000000
Treated:ProgramProgram A   -0.9899231316 3.988636646  -0.2481858388 0.8040419882
Treated:ProgramProgram B    2.5060086113 3.978370529   0.6299082986 0.5288996396
Treated:ProgramProgram C   -5.4721417069 3.982980462  -1.3738811324 0.1697889454
Treated:ProgramProgram D   -4.0043698991 3.978370529  -1.0065351806 0.3144036460

$`Response DiseaseCnt`
                               Estimate   Std. Error        t value         Pr(>|t|)
ProgramProgram A          4.53284671533 0.1793523653 25.27341475576 3.409326912e-109
ProgramProgram B          4.56410256410 0.1940771747 23.51694665775  1.515736580e-97
ProgramProgram C          4.25217391304 0.1957575279 21.72163675698  6.839384262e-86
ProgramProgram D          4.60150375940 0.1820294143 25.27890219412 3.133081901e-109
Treated:ProgramProgram A  0.13087009883 0.2667705543  0.49057175444  6.238378600e-01
Treated:ProgramProgram B -0.02274918064 0.2660839292 -0.08549625944  9.318841210e-01
Treated:ProgramProgram C  0.47375201288 0.2663922537  1.77840010867  7.564438017e-02
Treated:ProgramProgram D -0.31090546880 0.2660839292 -1.16844887901  2.429064705e-01

您特别关心回归表的Treated:Program条目。

票数 2
EN

Stack Overflow用户

发布于 2015-07-17 04:51:02

因为您试图从创建tt$statistic的函数中访问tt,所以会出现错误。一些圈套问题。

以下是一种按照您的版本进行处理的方法

代码语言:javascript
复制
results <- lapply(progs, function (i) {
  DS = subset(DT, Program == i)
  o <- lapply(vars, function (i) {
    frm <- formula(paste0(i, '~ Treated'))
    tt <- t.test(frm, DS)
    data.frame(Variable=i, T=tt$statistic, P=tt$p.value)
  })
  o <- do.call(rbind, o)
  o$Program <- i
  o
})
do.call(rbind, results)

或者您可以使用rbind-ing (例如)来完成它。蹒跚学步(我认为重新绑定仍在发生,就在幕后):

代码语言:javascript
复制
library(plyr)
combinations <- expand.grid(Program=progs, Y=vars)
ddply(combinations, .(Program, Y),
      function (x) {
          # x is a dataframe with the program and variable;
          #  just do the t-test and add the statistic & p-val to it
          frm <- formula(paste0(x$Y, '~ Treated'))
          tt <- t.test(frm, subset(DT, Program == x$Program))
          x$T <- tt$statistic
          x$P <- tt$p.value
          x
      })
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/31468148

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档