我正在R中执行一些分析,但是我需要计算连续的重复,并将ID放置在上面。如果有类似的ids,我不想仅仅进行聚合,当有连续的重复时,我想聚合特定的ids。我有一个带有这些条目的文件。
Probe Set ID Call Codes Chromosomal Position
SNP_A-2131660 BB 1156131
SNP_A-1967418 AB 2234251
SNP_A-1969580 BB 2329564
SNP_A-4263484 BB 2553624
SNP_A-1978185 AA 2936870
SNP_A-4264431 AA 2951834
SNP_A-1980898 BB 3095126
SNP_A-1983139 AA 3165267
SNP_A-4265735 AA 3302871
SNP_A-1995832 AA 3705226
SNP_A-1995893 AA 3720965
SNP_A-1997689 BB 3763164
SNP_A-1997709 AA 3763567
SNP_A-1997896 AA 3766240
SNP_A-1997922 AA 3766286
SNP_A-2000230 AA 4340877
SNP_A-2000332 AB 4343434我不想计算连续的值(即BB,AB,(BB,BB)),如果还有其他两列的话。
我尝试了不同的方法,但我只能用R中的代码来计算连续收获的数量。
# I got the counts
dfAA <- as.data.frame(with(rle(myfile$Call.Codes), lengths[values == "AA"]))
# I got the counts and the counted values
dfAA_02<-as.data.frame(rev(unclass(rle(myfile$Call.Codes))))我不知道怎么表演第二部分。
这就是我最后想要得到的。
Counts Aggregation_probeset_ID Aggregation Chromosomal position
BB 1 SNP_A-2131660 1156131
AB 1 SNP_A-1967418 2234251
BB 2 SNP_A-1969580, SNP_A-4263484 2329564, 2553624
AA 2 SNP_A-1978185, SNP_A-4264431 2936870, 2951834
AA 4 SNP_A-1983139, SNP_A-4265735, SNP_A-1995832, SNP_A-1995893 3165267, 3302871, 3705226, 3720965 发布于 2019-11-26 13:13:04
这就是你要找的吗?
library(data.table)
library(dplyr)
data[, lag := shift(Call_Codes, 1L, fill = , type = "lag")]
data[, new_group := if_else(lag != Call_Codes, 1, 0, missing = 1)]
data[, new_group := cumsum(new_group)]
data[, .(counts = .N,
Aggregation_probeset_ID = paste(Probe_Set_ID , collapse=","),
Aggregation_Chromosomal_position = paste(Chromosomal_Position , collapse=","))
, .(Call_Codes, new_group)][, -c('new_group')]结果:
Call_Codes counts Aggregation_probeset_ID Aggregation_Chromosomal_position
1: BB 1 SNP_A-2131660 1156131
2: AB 1 SNP_A-1967418 2234251
3: BB 2 SNP_A-1969580,SNP_A-4263484 2329564,2553624
4: AA 2 SNP_A-1978185,SNP_A-4264431 2936870,2951834
5: BB 1 SNP_A-1980898 3095126
6: AA 4 SNP_A-1983139,SNP_A-4265735,SNP_A-1995832,SNP_A-1995893 3165267,3302871,3705226,3720965
7: BB 1 SNP_A-1997689 3763164
8: AA 4 SNP_A-1997709,SNP_A-1997896,SNP_A-1997922,SNP_A-2000230 3763567,3766240,3766286,4340877
9: AB 1 SNP_A-2000332 4343434https://stackoverflow.com/questions/59050250
复制相似问题