我有大量的数据集。数据由105000个标记的371个基因型(从gwas开始)组成。我需要在R中有一个基因型之间的矩阵,以及使用105000个标记的特定数学方程。数据格式如下
markers gwas_100 gwas_101 gwas_102 gwas_103
S1_147748 NA NA NA NA
S1_239131 0.67385 0.67385 0.67385 0.67385
S1_644966 0.61051 0.61051 0.61051 0.61051
S1_1625764 NA 0.71429 NA 0.71429
S1_1761929 0.69137 0.69137 0.69137 0.69137
S1_1778021 0.72372 0.72372 0.72372 0.72372
S1_1778059 0.72507 0.72507 0.72507 0.72507
S1_1778136 0.68733 0.68733 0.68733 0.68733
S1_1778289 0.69946 0.69946 0.69946 0.69946
S1_1780669 0.73046 0.73046 0.73046 0.73046
S1_1786636 0.71563 0.71563 0.71563 0.71563
S1_1786639 0.71833 0.71833 0.71833 0.71833
S1_1786640 0.71294 0.71294 0.71294 0.71294
S1_1786678 0.71429 0.71429 0.71429 0.71429
S1_1963487 0.72776 0.72776 0.72776 0.72776
S1_2036329 0.74259 0.74259 0.74259 0.74259
S1_2036386 0.74394 0.74394 0.74394 0.74394
S1_2037735 0.7628 0.7628 0.7628 0.7628
S1_2037760 0.7628 0.7628 0.7628 0.7628
S1_2037773 0.7628 0.7628 0.7628 0.7628
S1_2042132 0.58491 NA NA NA数学方程式
(gwas_100 & gwas_101) = Sum (gwas100) - sum (gwas_101), where
sum gwas_100 = 0.67385 + 0.61051 + 0.69137.....+0.58491)
sum gwas_101 = 0.67385 + 0.61051+ ....... 0.7228), therefore
(gwas_100 & gwas_101) = 13.4905 - 13.61994 = -0.12938然后我需要得到两个之间的矩阵,以及所有可能的371种基因型的组合,例如
gwas_100 gwas101 gwas_102 gwas_103
gwas_100 -0.12 0.14 0.05
gwas_101 0.06 0.1
gwwas_102 0.07
gwas_103提前感谢
发布于 2017-03-10 10:32:16
您可以首先使用colSums对忽略NA的列求和,然后使用outer逐对减去这些列
sums <- colSums(data[-1], na.rm=TRUE)
outer(sums,sums,`-`)
gwas_100 gwas_101 gwas_102 gwas_103
gwas_100 0.00000 -0.12938 0.58491 -0.12938
gwas_101 0.12938 0.00000 0.71429 0.00000
gwas_102 -0.58491 -0.71429 0.00000 -0.71429
gwas_103 0.12938 0.00000 0.71429 0.00000https://stackoverflow.com/questions/42709004
复制相似问题