我的数据是一组大数据。每个唯一ID包括一个或多个类,每个类包含一个或多个X的唯一值。但是,我们可能在不同的ID中有相同的类(例如,ID009和ID020有相同的类),我试图根据不同的ID来找出每个类值X的唯一值。
ID <- c("ID004", "ID004", "ID004", "ID004", "ID004", "ID004", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID006", "ID009", "ID009", "ID009", "ID009", "ID009", "ID009", "ID009","ID020", "ID020", "ID020", "ID020", "ID020", "ID020", "ID020", "ID020", "ID023", "ID023", "ID023", "ID023", "ID023", "ID023","ID023", "ID023", "ID023", "ID023")
Class <- c("CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-001","CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-001","CMP-001", "CMP-001", "CMP-001", "CMP-002", "CMP-002", "CMP-002","CMP-002", "CMP-002", "CMP-005", "CMP-005", "CMP-005", "CMP-002", "CMP-002", "CMP-002", "CMP-002", "CMP-002","CMP-002", "CMP-002", "CMP-002", "CMP-002", "CMP-002", "CMP-002", "CMP-002","CMP-002", "CMP-004", "CMP-004", "CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-001","CMP-001", "CMP-001", "CMP-001", "CMP-001", "CMP-001")
X <- c(1,1,2,3,3,3,4,4,4,4,4,4,4,4,5,5,6,6,6,7,7,8,9,9,10,10,10,10,10,11,11,12,12,13,13,14,14,15,15,15,16,16,17,17,18,18,18)
data <- data.frame(ID, Class, X)结果应该是;
ID class No. of X value
ID004 CMP-001 3
ID006 CMP-001 1
CMP-002 2
CMP-005 2
ID009 CMP-002 2
ID020 CMP-002 3
CMP-004 1
ID023 CMP-001 4谢谢你的帮助,
发布于 2020-10-03 21:06:03
在这里,在使用“ID”、“类”进行分组之后,n_distinct是有用的。
library(dplyr)
data %>%
group_by(ID, Class) %>%
summarise(No_X_value = n_distinct(X), .groups = 'drop')-output
# A tibble: 8 x 3
# ID Class No_X_value
# <chr> <chr> <int>
#1 ID004 CMP-001 3
#2 ID006 CMP-001 1
#3 ID006 CMP-002 2
#4 ID006 CMP-005 2
#5 ID009 CMP-002 2
#6 ID020 CMP-002 3
#7 ID020 CMP-004 1
#8 ID023 CMP-001 4或使用data.table
library(data.table)
setDT(data)[, .(No_X_value = uniqueN(X), .(ID, Class)]或者将base R与aggregate结合使用
aggregate(X ~ ., unique(data), FUN = length)
# ID Class X
#1 ID004 CMP-001 3
#2 ID006 CMP-001 1
#3 ID023 CMP-001 4
#4 ID006 CMP-002 2
#5 ID009 CMP-002 2
#6 ID020 CMP-002 3
#7 ID020 CMP-004 1
#8 ID006 CMP-005 2发布于 2020-10-03 21:09:30
您还可以在base R中使用像这样的aggregate()方法
#Code
df <- aggregate(X~ID+Class,data,function(x) length(unique(x)))输出:
ID Class X
1 ID004 CMP-001 3
2 ID006 CMP-001 1
3 ID023 CMP-001 4
4 ID006 CMP-002 2
5 ID009 CMP-002 2
6 ID020 CMP-002 3
7 ID020 CMP-004 1
8 ID006 CMP-005 2发布于 2020-10-03 22:27:06
如果你使用基数R,我认为aggregate @akrun的方法已经是一个非常有效的方法了。下面是另一个选择,但更复杂。
subset(as.data.frame(xtabs(cnt ~ ID + Class,cbind(cnt = 1,unique(data)))),Freq >0)这给
ID Class Freq
1 ID004 CMP-001 3
2 ID006 CMP-001 1
5 ID023 CMP-001 4
7 ID006 CMP-002 2
8 ID009 CMP-002 2
9 ID020 CMP-002 3
14 ID020 CMP-004 1
17 ID006 CMP-005 2https://stackoverflow.com/questions/64188980
复制相似问题