我想计算疾病pairs.The样本数据以下的Tanimoto系数(集合/Union的交集),只对一个疾病对计算。疾病1为NK细胞缺陷,2为腺苷丁二酸裂解酶缺乏。
Set 1是疾病1 (NK细胞缺陷),它包含来自Gene1柱的所有基因。
Set 2为疾病2(Adenylo琥珀酸裂解酶缺乏症),它含有来自Gene2柱的所有基因。
**Gene1** **Gene2** **Disease1** **Disease2**
IMPDH1 XDH NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 ADA NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 NPR1 NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 IMPDH1 NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 IMPDH2 NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 PPP3R2 NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 RRM1 NK cell defects Adenylosuccinate lyase deficiency
NPR1 POLA1 NK cell defects Adenylosuccinate lyase deficiency
PPP3R2 ITGAL NK cell defects Adenylosuccinate lyase deficiency
ITGAL NPR1 NK cell defects Adenylosuccinate lyase deficiency
CASP3 NPR1 NK cell defects Adenylosuccinate lyase deficiency
PTK2B NPR1 NK cell defects Adenylosuccinate lyase deficiency
TNF GUCY1A2 NK cell defects Adenylosuccinate lyase deficiency
PTK2B GUCY1A2 NK cell defects Adenylosuccinate lyase deficiency对于如何在MySQL或R中这样做,有什么建议吗?
谢谢,
罗汉
发布于 2013-12-10 04:53:53
随机输入数据-
library(data.table)
DT = data.table(
G1=1:5,
G2=3:7,
D1="A",
D2="B"
)
DT[,
list(
intersectG = length(intersect(G1,G2)),
unionG = length(union(G1,G2)),
Tanimoto = length(union(G1,G2))/length(intersect(G1,G2))
),
by = c('D1','D2')]产出-
D1 D2 intersectG unionG Tanimoto
1: A B 3 7 2.333333发布于 2013-12-10 04:46:09
学会搜索:
install.packages("sos")
library("sos")
findFn("Tanimoto")getGeneSim {GOSim} R文档
计算基因的功能相似性
描述
计算使用不同策略的基因列表的成对功能相似点。用法
getGeneSim(genelist1, genelist2=NULL, similarity="funSimMax", similarityTerm="relevance",
normalization="Tanimoto", method="sqrt", avg=(similarity=="OA"), verbose=FALSE)https://stackoverflow.com/questions/20486075
复制相似问题