首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用R计算案例控制队列中多个变体的实例数

使用R计算案例控制队列中多个变体的实例数
EN

Stack Overflow用户
提问于 2020-01-21 16:50:36
回答 3查看 151关注 0票数 1

我有一张遗传变异表,每一行都代表一个病人,病人身上有这种变异,以及该变异是在一个病例中还是在一个对照中。为了执行Fisher's测试,我想输出一个包含三列的单独的矩阵--变体、事例中的数字和控件中的数字。

我使用的是R,表看起来像这样(PID -病人ID)

代码语言:javascript
复制
Variant ID      PID     Disease
2:4324:2343     FF354   Yes
2:4324:2343     FF355   Control
2:4324:2343     FF356   Control
2:4324:2343     FF357   Yes
2:4324:2343     FF358   Yes
3:346543:345    FF354   Yes
3:346543:345    FF358   Control
3:346543:345    FF390   Control
3:346543:345    FF391   Yes
6:234:34234     FF358   Yes
6:234:34234     FF390   Control
6:234:34234     FF358   Control
6:234:34234     FF213   Yes 

预期的产出将是:

代码语言:javascript
复制
Variant ID  Disease Control
2:4324:2343     3   2
3:346543:345    2   2
6:234:34234     2   2

我想我将不得不在R中使用循环,但我必须承认,这在目前我是超越我,而我可以抓住R。任何帮助将是非常感谢的!

非常感谢

EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2020-01-21 17:01:22

你可以使用tapply,它给你一个很好的矩阵。

代码语言:javascript
复制
with(dat, tapply(Disease, list(Variant_ID, Disease), length))
#              Control Yes
# 2:4324:2343        2   3
# 3:346543:345       2   2
# 6:234:34234        2   2

数据:

代码语言:javascript
复制
dat <- structure(list(Variant_ID = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("2:4324:2343", "3:346543:345", 
"6:234:34234"), class = "factor"), PID = structure(c(2L, 3L, 
4L, 5L, 6L, 2L, 6L, 7L, 8L, 6L, 7L, 6L, 1L), .Label = c("FF213", 
"FF354", "FF355", "FF356", "FF357", "FF358", "FF390", "FF391"
), class = "factor"), Disease = structure(c(2L, 1L, 1L, 2L, 2L, 
2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("Control", "Yes"), class = "factor")), class = "data.frame", row.names = c(NA, 
-13L))
票数 1
EN

Stack Overflow用户

发布于 2020-01-21 16:52:08

我们可以得到频率count,然后把它重塑为“wide”

代码语言:javascript
复制
library(dplyr)
library(tidyr)
df1 %>% 
    count(VariantID, Disease) %>%
    pivot_wider(names_from = Disease, values_from = n)
# A tibble: 3 x 3
#  VariantID    Control   Yes
#  <chr>          <int> <int>
#1 2:4324:2343        2     3
#2 3:346543:345       2     2
#3 6:234:34234        2     2

或者是来自tablebase R

代码语言:javascript
复制
table(df1[c('VariantID', 'Disease')])
#            Disease
#VariantID      Control Yes
# 2:4324:2343        2   3
# 3:346543:345       2   2
# 6:234:34234        2   2

数据

代码语言:javascript
复制
df1 <- structure(list(VariantID = c("2:4324:2343", "2:4324:2343", "2:4324:2343", 
"2:4324:2343", "2:4324:2343", "3:346543:345", "3:346543:345", 
"3:346543:345", "3:346543:345", "6:234:34234", "6:234:34234", 
"6:234:34234", "6:234:34234"), PID = c("FF354", "FF355", "FF356", 
"FF357", "FF358", "FF354", "FF358", "FF390", "FF391", "FF358", 
"FF390", "FF358", "FF213"), Disease = c("Yes", "Control", "Control", 
"Yes", "Yes", "Yes", "Control", "Control", "Yes", "Yes", "Control", 
"Control", "Yes")), class = "data.frame", row.names = c(NA, -13L
))
票数 1
EN

Stack Overflow用户

发布于 2020-01-21 16:53:20

使用来自dcastdata.table

代码语言:javascript
复制
library(data.table)
setDT(df); dcast(df, VariantID ~ Disease)

#     VariantID  Control  Yes
#1  2:4324:2343       2   3
#2  3:346543:345      2   2
#3  6:234:34234       2   2

数据

代码语言:javascript
复制
df <- structure(list(VariantID = structure(c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L), .Label = c("2:4324:2343", "3:346543:345", "6:234:34234"), class = "factor"), PID = structure(c(2L, 3L,4L, 5L, 6L, 2L, 6L, 7L, 8L, 6L, 7L, 6L, 1L), .Label = c("FF213","FF354", "FF355", "FF356", "FF357", "FF358", "FF390", "FF391"), class = "factor"), Disease = structure(c(2L, 1L, 1L, 2L, 2L,2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L), .Label = c("Control", "Yes"), class = "factor")), class = "data.frame", row.names = c(NA, -13L))
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/59845744

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档