如果有人能给我一些指导,如何解决矩阵的复杂排序和每个子类别中前2个元素的选择,我将不胜感激。
代码:
index<-1:14
metric<-c(0.037777,0.041143,0.041043,0.042056,0.043701,0.042169,0.042134,
0.046565,0.044638,0.036653,0.046221,0.04033,0.045385,0.043873)
cat_1<-c("California Munis","California Munis","California Munis","California Munis",
"California Munis","California Munis","California Munis","Corporate Bonds",
"Corporate Bonds","Corporate Bonds","Government Bonds","Government Bonds",
"High Yield Bonds","High Yield Bonds")
cat_2<-c("California Munis","Corporate Bonds","Corporate Bonds","Government Bonds",
"High Yield Bonds","High Yield Bonds","High Yield Bonds","High Yield Bonds",
"High Yield Bonds","High Yield Bonds","California Munis","California Munis",
"Corporate Bonds","Corporate Bonds")
data<-data.frame(cbind(index,metric,cat_1,cat_2))它会生成下面的矩阵
Ind Metric Cat_1 Cat_2
1 0.037777 California Munis California Munis
2 0.041143 California Munis Corporate Bonds
3 0.041043 California Munis Corporate Bonds
4 0.042056 California Munis Government Bonds
5 0.043701 California Munis High Yield Bonds
6 0.042169 California Munis High Yield Bonds
7 0.042134 California Munis High Yield Bonds
8 0.046565 Corporate Bonds High Yield Bonds
9 0.044638 Corporate Bonds High Yield Bonds
10 0.036653 Corporate Bonds High Yield Bonds
11 0.046221 Government Bonds California Munis
12 0.04033 Government Bonds California Munis
13 0.045385 High Yield Bonds Corporate Bonds
14 0.043873 High Yield Bonds Corporate Bonds考虑到上面的矩阵,我想根据Cat_1,Cat_2和公制进行排序。我已经尝试过了:
data[order(data[,3],data[,4],data[,2]),]但是,如果Cat_1和Cat_2的条目相同,则它们应该是无关的。例如,"California Munis"&"Corporate“=”Corporate“&”California Munis“。我希望得到的结果应该与以下矩阵中的结果类似
Ind Metric Cat_1 Cat_2 Selection
1 0.037777 California Munis California Munis 1
2 0.041143 California Munis Corporate Bonds 1
3 0.041043 California Munis Corporate Bonds 2
11 0.046221 Government Bonds California Munis 1
4 0.042056 California Munis Government Bonds 2
12 0.04033 Government Bonds California Munis
5 0.043701 California Munis High Yield Bonds 1
6 0.042169 California Munis High Yield Bonds 2
7 0.042134 California Munis High Yield Bonds
8 0.046565 Corporate Bonds High Yield Bonds 1
13 0.045385 High Yield Bonds Corporate Bonds 2
9 0.044638 Corporate Bonds High Yield Bonds
14 0.043873 High Yield Bonds Corporate Bonds
10 0.036653 Corporate Bonds High Yield Bonds 最后一列显示了我需要提取的每个子类别的前2行的选择。
任何想法或代码都将受到高度赞赏。
谢谢
发布于 2013-11-04 15:26:13
请放弃使用data.frame(cbind(...))。它只会给你带来悲伤。
newdat <- data[ with( data,
order( pmax( as.numeric(cat_1), as.numeric(cat_2) ),
pmin( as.numeric(cat_1), as.numeric(cat_2) ) ,
- metric) ) , ]
newdat$selection <- ave(index,
first=pmax( as.numeric(newdat$cat_1),
as.numeric(newdat$cat_2) ),
second= pmin( as.numeric(newdat$cat_1),
as.numeric(newdat$cat_2) ) ,
FUN=seq)
#-----------------------------------------
> newdat
index metric cat_1 cat_2 selection
1 1 0.037777 California Munis California Munis 1
2 2 0.041143 California Munis Corporate Bonds 1
3 3 0.041043 California Munis Corporate Bonds 2
11 11 0.046221 Government Bonds California Munis 1
4 4 0.042056 California Munis Government Bonds 2
12 12 0.040330 Government Bonds California Munis 3
5 5 0.043701 California Munis High Yield Bonds 1
6 6 0.042169 California Munis High Yield Bonds 2
7 7 0.042134 California Munis High Yield Bonds 3
8 8 0.046565 Corporate Bonds High Yield Bonds 1
13 13 0.045385 High Yield Bonds Corporate Bonds 2
9 9 0.044638 Corporate Bonds High Yield Bonds 3
14 14 0.043873 High Yield Bonds Corporate Bonds 4
10 10 0.036653 Corporate Bonds High Yield Bonds 5这里成功的必要条件是两个cat变量中的级别相同。如果不是,则使它们与levels(.) <- union(levels(cat1, levels(cat_2))相同
发布于 2013-11-04 22:29:40
我对我的评论进行了扩展
# introduce combined category
cat3 <- sapply(paste(data$cat_1,data$cat_2,sep=" "),function(x){paste(sort(strsplit(x," ")[[1]]), collapse=" ")})
data$cat_3 <- cat3
# order as desired
data1 <- data[order( cat_3 , -metric), ]
# label and select top 2 in each cat
data1$rankByCat <- unlist(sapply(unique(data1$cat_3), function(mycat, mydf) {return(1:sum(mydf$cat_3==mycat))}, mydf=data1))
data1[data1$rankByCat < 3, !names(data1)%in%c("cat_3")]发布于 2013-11-04 12:17:20
@andrei
我已经用下面的代码得到了排序部分:
#concacenate the 2 strings
cat_3<-paste(data[,3],data[,4],sep=" ")
#break the string to 2 (creates a list)
temp_split<-strsplit(cat_3," ")
#sort by row
sort_split<-sapply(temp_split,sort)
#bind split
out<-cbind(data,t(sort_split))这是最好的写法吗?
我如何从这里开始选择每个类别的前2名?
谢谢你的帮助!
https://stackoverflow.com/questions/19760454
复制相似问题