首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >数据帧和选择的R顺序

数据帧和选择的R顺序
EN

Stack Overflow用户
提问于 2013-11-04 09:41:20
回答 3查看 85关注 0票数 0

如果有人能给我一些指导,如何解决矩阵的复杂排序和每个子类别中前2个元素的选择,我将不胜感激。

代码:

代码语言:javascript
复制
index<-1:14
metric<-c(0.037777,0.041143,0.041043,0.042056,0.043701,0.042169,0.042134,
          0.046565,0.044638,0.036653,0.046221,0.04033,0.045385,0.043873)
cat_1<-c("California Munis","California Munis","California Munis","California Munis",
         "California Munis","California Munis","California Munis","Corporate Bonds",
         "Corporate Bonds","Corporate Bonds","Government Bonds","Government Bonds",
         "High Yield Bonds","High Yield Bonds")
cat_2<-c("California Munis","Corporate Bonds","Corporate Bonds","Government Bonds",
         "High Yield Bonds","High Yield Bonds","High Yield Bonds","High Yield Bonds",
         "High Yield Bonds","High Yield Bonds","California Munis","California Munis",
         "Corporate Bonds","Corporate Bonds")

data<-data.frame(cbind(index,metric,cat_1,cat_2))

它会生成下面的矩阵

代码语言:javascript
复制
Ind Metric     Cat_1                Cat_2
1   0.037777    California Munis    California Munis
2   0.041143    California Munis    Corporate Bonds
3   0.041043    California Munis    Corporate Bonds
4   0.042056    California Munis    Government Bonds
5   0.043701    California Munis    High Yield Bonds
6   0.042169    California Munis    High Yield Bonds
7   0.042134    California Munis    High Yield Bonds
8   0.046565    Corporate Bonds     High Yield Bonds
9   0.044638    Corporate Bonds     High Yield Bonds
10  0.036653    Corporate Bonds     High Yield Bonds
11  0.046221    Government Bonds    California Munis
12  0.04033     Government Bonds    California Munis
13  0.045385    High Yield Bonds    Corporate Bonds
14  0.043873    High Yield Bonds    Corporate Bonds

考虑到上面的矩阵,我想根据Cat_1,Cat_2和公制进行排序。我已经尝试过了:

代码语言:javascript
复制
data[order(data[,3],data[,4],data[,2]),]

但是,如果Cat_1和Cat_2的条目相同,则它们应该是无关的。例如,"California Munis"&"Corporate“=”Corporate“&”California Munis“。我希望得到的结果应该与以下矩阵中的结果类似

代码语言:javascript
复制
Ind Metric      Cat_1               Cat_2               Selection
1   0.037777    California Munis    California Munis    1
2   0.041143    California Munis    Corporate Bonds     1
3   0.041043    California Munis    Corporate Bonds     2
11  0.046221    Government Bonds    California Munis    1
4   0.042056    California Munis    Government Bonds    2
12  0.04033     Government Bonds    California Munis    
5   0.043701    California Munis    High Yield Bonds    1
6   0.042169    California Munis    High Yield Bonds    2
7   0.042134    California Munis    High Yield Bonds    
8   0.046565    Corporate Bonds     High Yield Bonds    1
13  0.045385    High Yield Bonds    Corporate Bonds     2
9   0.044638    Corporate Bonds     High Yield Bonds    
14  0.043873    High Yield Bonds    Corporate Bonds 
10  0.036653    Corporate Bonds     High Yield Bonds    

最后一列显示了我需要提取的每个子类别的前2行的选择。

任何想法或代码都将受到高度赞赏。

谢谢

EN

回答 3

Stack Overflow用户

发布于 2013-11-04 15:26:13

请放弃使用data.frame(cbind(...))。它只会给你带来悲伤。

代码语言:javascript
复制
 newdat <- data[ with( data, 
                order( pmax( as.numeric(cat_1), as.numeric(cat_2) ), 
                       pmin( as.numeric(cat_1), as.numeric(cat_2) ) ,
                     - metric) ) , ]
 newdat$selection <- ave(index, 
                         first=pmax( as.numeric(newdat$cat_1), 
                                     as.numeric(newdat$cat_2) ), 
                        second= pmin( as.numeric(newdat$cat_1), 
                                      as.numeric(newdat$cat_2) ) ,
                         FUN=seq)
#-----------------------------------------
> newdat
   index   metric            cat_1            cat_2 selection
1      1 0.037777 California Munis California Munis         1
2      2 0.041143 California Munis  Corporate Bonds         1
3      3 0.041043 California Munis  Corporate Bonds         2
11    11 0.046221 Government Bonds California Munis         1
4      4 0.042056 California Munis Government Bonds         2
12    12 0.040330 Government Bonds California Munis         3
5      5 0.043701 California Munis High Yield Bonds         1
6      6 0.042169 California Munis High Yield Bonds         2
7      7 0.042134 California Munis High Yield Bonds         3
8      8 0.046565  Corporate Bonds High Yield Bonds         1
13    13 0.045385 High Yield Bonds  Corporate Bonds         2
9      9 0.044638  Corporate Bonds High Yield Bonds         3
14    14 0.043873 High Yield Bonds  Corporate Bonds         4
10    10 0.036653  Corporate Bonds High Yield Bonds         5

这里成功的必要条件是两个cat变量中的级别相同。如果不是,则使它们与levels(.) <- union(levels(cat1, levels(cat_2))相同

票数 2
EN

Stack Overflow用户

发布于 2013-11-04 22:29:40

我对我的评论进行了扩展

代码语言:javascript
复制
# introduce combined category
cat3 <- sapply(paste(data$cat_1,data$cat_2,sep=" "),function(x){paste(sort(strsplit(x," ")[[1]]), collapse=" ")})
data$cat_3 <- cat3
# order as desired
data1 <- data[order( cat_3 , -metric), ]
# label and select top 2 in each cat
data1$rankByCat <- unlist(sapply(unique(data1$cat_3), function(mycat, mydf)  {return(1:sum(mydf$cat_3==mycat))}, mydf=data1))
data1[data1$rankByCat < 3, !names(data1)%in%c("cat_3")]
票数 1
EN

Stack Overflow用户

发布于 2013-11-04 12:17:20

@andrei

我已经用下面的代码得到了排序部分:

代码语言:javascript
复制
#concacenate the 2 strings
cat_3<-paste(data[,3],data[,4],sep="  ")

#break the string to 2 (creates a list)
temp_split<-strsplit(cat_3,"  ")

#sort by row
sort_split<-sapply(temp_split,sort)

#bind split
out<-cbind(data,t(sort_split))

这是最好的写法吗?

我如何从这里开始选择每个类别的前2名?

谢谢你的帮助!

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/19760454

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档