首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在R中计算同类/归因

在R中计算同类/归因
EN

Stack Overflow用户
提问于 2016-09-13 12:53:51
回答 1查看 698关注 0票数 0

我有一个指标,它分布在四个类别a, b, c, d中。

在一段时间内,我跟踪每个类别的度量中的移动。这些移动的总和表示已经离开或从其他地方(‘外部’)进入系统的数量。

代码语言:javascript
复制
# SETUP -------------------------------------------------------------------

categories <- letters[1:4]
set.seed(1)
movements <- lapply(categories, function(...) {round(runif(10, -10,10))*10})
names(movements) <- categories
movements[['external']] <- Reduce(`+`, movements)*-1
problem <- as.data.frame(movements)
problem

     a   b    c   d external
1  -50 -60   90   0       20
2  -30 -60  -60  20      130
3   10  40   30   0      -80
4   80 -20  -70 -60       70
5  -60  50  -50  70      -10
6   80   0  -20  30      -90
7   90  40 -100  60      -90
8   30 100  -20 -80      -30
9   30 -20   70  40     -120
10 -90  60  -30 -20       80

在一些类别经历了积极的移动和其他类别经历了负面移动的情况下,我们可以推断系统内的转移。

代码语言:javascript
复制
# ADD TRANSFER COLUMNS AND INITIALISE TO 0 --------------------------------

transfer_matrix <- combn(c(categories, 'external'), 2)
transfer_list <- combn(c(categories, 'external'), 2, simplify=F)
problem[,sapply(transfer_list, paste, collapse='.')] <- 0
paste(names(problem), collapse=', ')

[1] "a, b, c, d, external, a.b, a.c, a.d, a.external, b.c, b.d, b.external, c.d, c.external, d.external"

例如,a减少了50,c增加了90,因此我们可以推断存在从ac的传输,该传输将存储在变量a.c中。

计算转移的规则是成比例的。因此,当'a‘减少50,b减少60时,c增加的50/(50+60)应归因于'a',而c增加的60/(50+60)应归因于b。同样地,对于进出系统的传输也是如此。

下面显示了我需要的第一行所有变量的完整手动计算:

代码语言:javascript
复制
# MANUAL CALCULATION ------------------------------------------------------

row_limit <- 1  # change to e.g. 1:10
problem[row_limit, 'a.b'] <- 0
problem[row_limit, 'a.c'] <- 90*(-50/(-50+-60))
problem[row_limit, 'a.d'] <- 0
problem[row_limit, 'a.external'] <- 20 * -50/(-50+-60)
problem[row_limit, 'b.c'] <- 90*(-60/(-50+-60))
problem[row_limit, 'b.d'] <- 0 
problem[row_limit, 'b.external'] <- 20 * -60/(-50+-60)
problem[row_limit, 'c.d'] <- 0
problem[row_limit, 'c.external'] <- 0
problem[row_limit, 'd.external'] <- 0

请注意,由于a.c = -c.a只需要计算所有可能传输的子集。

我的问题是,我如何以编程的方式编写上述计算,以一种简洁而有效的方式来处理10-20个类别和大量的行?

我通常使用data.table,但我愿意接受任何使用包的建议。

下面是一些用于检查输出的代码:

代码语言:javascript
复制
# CHECKING ----------------------------------------------------------------

check <- function(problem, category, categories, transfer_list, transfer_matrix) {
  out_columns <- sapply(transfer_list[transfer_matrix[1,] == category], paste, collapse='.')
  in_columns <- sapply(transfer_list[transfer_matrix[2,] == category], paste, collapse='.')
  stopifnot(length(c(out_columns, in_columns)) == length(categories)-1)

  out_sum <- 0
  if(length(out_columns) == 1) {
    out_sum <- problem[,out_columns]
  } else if(length(out_columns) > 1) {
    out_sum <- Reduce(`+`, problem[,out_columns])
  }

  in_sum <- 0     
  if(length(in_columns) == 1) {
    in_sum <- problem[,in_columns]
  }
  else if(length(in_columns) > 1) {
    in_sum <- Reduce(`+`, problem[,in_columns])
  }

  lhs <- out_sum - in_sum
  rhs <- -problem[, category]
  sprintf('%s vs %s',lhs, rhs)
}

# For each category, actual vs expected
sapply(c(categories,'external'), check, problem=problem, categories=c(categories,'external'), transfer_list=transfer_list,transfer_matrix=transfer_matrix)

      a          b           c            d         
 [1,] "50 vs 50" "60 vs 60"  "-90 vs -90" "0 vs 0"  
 [2,] "0 vs 30"  "0 vs 60"   "0 vs 60"    "0 vs -20"
 [3,] "0 vs -10" "0 vs -40"  "0 vs -30"   "0 vs 0"  
 [4,] "0 vs -80" "0 vs 20"   "0 vs 70"    "0 vs 60" 
 [5,] "0 vs 60"  "0 vs -50"  "0 vs 50"    "0 vs -70"
 [6,] "0 vs -80" "0 vs 0"    "0 vs 20"    "0 vs -30"
 [7,] "0 vs -90" "0 vs -40"  "0 vs 100"   "0 vs -60"
 [8,] "0 vs -30" "0 vs -100" "0 vs 20"    "0 vs 80" 
 [9,] "0 vs -30" "0 vs 20"   "0 vs -70"   "0 vs -40"
[10,] "0 vs 90"  "0 vs -60"  "0 vs 30"    "0 vs 20" 
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2016-09-14 15:39:28

这里有一个想法。我相信输出结果符合您的要求。

代码语言:javascript
复制
#x is a row from problem df
#y is a column from transfer_matrix
check_pairs <- function(x,y){
    #split y into which columns are being compared . e.g. if col 1 is 'd' vs 'external', then ...
    a <- y[1]  #would be 'd'
    b <- y[2]  #would be 'external'
    #if both pos, both neg, or one val is 0, then return 0
    if( sign(x[a]) == sign(x[b]) | sign(x[[a]]) == 0){
        return(0)
    }else{ #else return formula from your manual calculation
        return( x[[b]] * x[[a]] / sum( x[sign(x)==sign(x[[a]]) ] ) )
    }
}

#for each row of the problem matrix, compare to each column of the transfer_matrix
check_matrix_cols <- function(x){ 
    return( apply(transfer_matrix, 2, function(y) check_pairs(x,y)) )
}

problem[,-seq(length(c(categories, 'external')))] <- t( apply(problem, 1, check_matrix_cols) )

sapply(c(categories,'external'), check, problem=problem, categories=c(categories,'external'), transfer_list=transfer_list,transfer_matrix=transfer_matrix)

          a            b              c            d            external      
 [1,] "50 vs 50"   "60 vs 60"     "-90 vs -90" "0 vs 0"     "-20 vs -20"  
 [2,] "30 vs 30"   "60 vs 60"     "60 vs 60"   "-20 vs -20" "-130 vs -130"
 [3,] "-10 vs -10" "-40 vs -40"   "-30 vs -30" "0 vs 0"     "80 vs 80"    
 [4,] "-80 vs -80" "20 vs 20"     "70 vs 70"   "60 vs 60"   "-70 vs -70"  
 [5,] "60 vs 60"   "-50 vs -50"   "50 vs 50"   "-70 vs -70" "10 vs 10"    
 [6,] "-80 vs -80" "0 vs 0"       "20 vs 20"   "-30 vs -30" "90 vs 90"    
 [7,] "-90 vs -90" "-40 vs -40"   "100 vs 100" "-60 vs -60" "90 vs 90"    
 [8,] "-30 vs -30" "-100 vs -100" "20 vs 20"   "80 vs 80"   "30 vs 30"    
 [9,] "-30 vs -30" "20 vs 20"     "-70 vs -70" "-40 vs -40" "120 vs 120"  
[10,] "90 vs 90"   "-60 vs -60"   "30 vs 30"   "20 vs 20"   "-80 vs -80"  
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/39462417

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档