文章/答案/技术大牛

发布

社区首页 >问答首页 >简化计算，因此可以使用矩阵运算来完成

问简化计算，因此可以使用矩阵运算来完成
EN

Stack Overflow用户

提问于 2013-11-17 17:38:41

回答 3查看 178关注 0票数 0

我的基本运算是对两个相同长度的概率向量的运算。我们称它们为A，B。在R中，公式是：

t = 1-prod(1-A*B)

也就是说，结果是一个标量，(1-AB)是一个逐点操作，它的结果是一个第i个元素为1-a_i*b_i的向量。prod运算符给出向量元素的乘积。

其含义(如您所能猜到的)是这样的:假设A是N个病源(或其他信号)中的每一个都具有某种疾病的概率。B是每个源将疾病传播到目标的概率矢量。结果是目标从(至少其中一个)来源获得疾病的概率。

好的，现在我有很多类型的信号，所以我有很多"A“向量。对于每种类型的信号，我有许多目标，每个目标具有不同的传输概率(或许多"B“向量)，我想计算每一对的"t”结果。

理想情况下，如果运算是向量的“内积”，那么矩阵乘法就可以做到这一点。但我的行动不是这样的(我认为)。

我寻找的是向量A和B上的某种变换，所以我可以使用矩阵乘法。欢迎任何其他建议来简化我的计算。

下面是一个例子(R中的代码)

A = rbind(c(0.9,0.1,0.3),c(0.7,0.2,0.1))
A 
# that is, the probability of source 2 to have disease/signal 1 is 0.1 (A[1,2]
# neither rows nor columns need to sum to 1.
B = cbind(c(0,0.3,0.9),c(0.9,0.6,0.3),c(0.3,0.8,0.3),c(0.4,0.5,1))
B
# that is, the probability of target 4 to acquire a disease from source 2 is 0.5 B[2,4]
# again, nothing needs to sum to 1 here

# the outcome should be:
C = t(apply(A,1,function(x) apply(B,2,function(y) 1-prod(1-x*y))))
# which basically loops on every row in A and every column in B and 
# computes the required formula
C
# while this is quite elegant, it is not very efficient, and I look for transformations
# on my A,B matrices so I could write, in principle
# C = f(A)%*%g(B), where f(A) is my transformed A, g(B) is my transformed(B),
# and %*% is matrix multiplication

# note that if replace (1-prod(1-xy)) in the formula above with sum(x*y), the result
# is exactly matrix multiplication, which is why I think, I'm not too far from that
# and want to enjoy the benefits of already implemented optimizations of matrix
# multiplications.

algorithm

matlab

transformation

matrix-multiplication

回答 3

Stack Overflow用户

发布于 2013-11-17 21:57:01

这是Rcpp擅长的工作。嵌套循环很容易实现，您不需要太多的C++经验。(我喜欢RcppEigen，但您并不是真的需要它。您可以使用“纯”Rcpp。)

library(RcppEigen)
library(inline)

incl <- '
using  Eigen::Map;
using  Eigen::MatrixXd;
typedef  Map<MatrixXd>  MapMatd;
'

body <- '
const MapMatd        A(as<MapMatd>(AA)), B(as<MapMatd>(BB));
const int            nA(A.rows()), mA(A.cols()), mB(B.cols());
MatrixXd             R = MatrixXd::Ones(nA,mB);
for (int i = 0; i < nA; ++i) 
{
  for (int j = 0; j < mB; ++j) 
  {
    for (int k = 0; k < mA; ++k) 
    {
      R(i,j) *= (1 - A(i,k) * B(k,j));
    }
    R(i,j) = 1 - R(i,j);
  }
}
return                wrap(R);
'

funRcpp <- cxxfunction(signature(AA = "matrix", BB ="matrix"), 
                         body, "RcppEigen", incl)

现在，让我们把你的代码放在一个R函数中：

doupleApply <- function(A, B) t(apply(A,1,
                               function(x) apply(B,2,function(y) 1-prod(1-x*y))))

比较结果：

all.equal(doupleApply(A,B), funRcpp(A,B))
#[1] TRUE

基准：

library(microbenchmark)
microbenchmark(doupleApply(A,B), funRcpp(A,B))

# Unit: microseconds
#             expr     min       lq   median       uq     max neval
#doupleApply(A, B) 169.699 179.2165 184.4785 194.9290 280.011   100
#    funRcpp(A, B)   1.738   2.3560   4.6885   4.9055  11.293   100

set.seed(42)
A <- matrix(rnorm(3*1e3), ncol=3)
B <- matrix(rnorm(3*1e3), nrow=3)

all.equal(doupleApply(A,B), funRcpp(A,B))
#[1] TRUE
microbenchmark(doupleApply(A,B), funRcpp(A,B), times=5)

# Unit: milliseconds
#              expr        min         lq     median         uq        max neval
# doupleApply(A, B) 4483.46298 4585.18196 4587.71539 4672.01518 4712.92597     5
#     funRcpp(A, B)   24.05247   24.08028   24.48494   26.32971   28.38075     5

票数 1

Stack Overflow用户

发布于 2013-11-17 22:02:13

首先，我应该指出，R代码可能会误导某些Matlab用户，因为R中的A*B等同于Matlab中的A.*B (逐元素乘法)。我在我的计算中使用了符号变量，以便发生的操作更清晰。

syms a11 a12 a21 a22 b11 b12 b21 b22
syms a13 a31 a23 a32 a33
syms b13 b31 b23 b32 b33

首先考虑最简单的情况，我们只有一个向量A和一个向量B：

A1 = [a11;a21] ;
B1 = [b11;b21] ;

您想要的结果是

1 - prod(1-A1.*B1)
=
1 - (a11*b11 - 1)*(a12*b12 - 1)

现在假设我们有3个向量A和2个向量B在列中一个接一个地堆叠：

A3 = [a11 a12 a13;a21 a22 a23; a31 a32 a33];
B2 = [b11 b12 ;b21 b22 ; b31 b32];

为了获得与B2的所有可能的列向量组合配对的A3的所有可能的列向量组合的索引，可以执行以下操作：

[indA indB] = meshgrid(1:3,1:2);

现在，由于对于两个向量a，b的两两乘积，它认为a.*b = b.*a，我们可以只保留唯一的索引对。您可以按如下方式执行此操作：

indA = triu(indA); indB = triu(indB);
indA = reshape(indA(indA>0),[],1); indB = reshape(indB(indB>0),[],1);

现在可以计算出您想要的结果：

result = 1 - prod(1-A3(:,indA).*B2(:,indB))

只是为了更好的可读性：

pretty(result.')

=

  +-                                               -+ 
  |  (a11 b11 - 1) (a21 b21 - 1) (a31 b31 - 1) + 1  | 
  |                                                 | 
  |  (a12 b11 - 1) (a22 b21 - 1) (a32 b31 - 1) + 1  | 
  |                                                 | 
  |  (a12 b12 - 1) (a22 b22 - 1) (a32 b32 - 1) + 1  | 
  |                                                 | 
  |  (a13 b11 - 1) (a23 b21 - 1) (a33 b31 - 1) + 1  | 
  |                                                 | 
  |  (a13 b12 - 1) (a23 b22 - 1) (a33 b32 - 1) + 1  | 
  +-                                               -+

票数 1

Stack Overflow用户

发布于 2013-11-19 06:35:17

如果我理解amit的问题，那么在Matlab中可以做的事情如下：

数据：

M = 4e3;    % M different cases
N = 5e2;    % N sources
K = 5e1;    % K targets
A = rand(M, N);    % M-by-N matrix of random numbers
A = A ./ repmat(sum(A, 2), 1, N);    % M-by-N matrix of probabilities (?)
B = rand(N, K);    % N-by-K matrix of random numbers
B = B ./ repmat(sum(B), N, 1);    % N-by-K matrix of probabilities (?)

第一个解决方案

% One-liner solution:
tic
C = squeeze(1 - prod(1 - repmat(A, [1 1 K]) .* permute(repmat(B, [1 1 M]), [3 1 2]), 2));
toc
% Elapsed time is 6.695364 seconds.

第二种解决方案

% Partial vectorization 1
tic
D = zeros(M, K);
for hh = 1:M
  tmp = repmat(A(hh, :)', 1, K);
  D(hh, :) = 1 - prod((1 - tmp .* B), 1);
end
toc
% Elapsed time is 0.686487 seconds.

第三种解决方案

% Partial vectorization 2
tic
E = zeros(M, K);
for hh = 1:M
  for ii = 1:K
    E(hh, ii) = 1 - prod(1 - A(hh, :)' .* B(:, ii));
  end
end
toc
% Elapsed time is 2.003891 seconds.

第四种解决方案

% No vectorization at all
tic
F = ones(M, K);
for hh = 1:M
  for ii = 1:K
    for jj = 1:N
      F(hh, ii) = F(hh, ii) * prod(1 - A(hh, jj) .* B(jj, ii));
    end
    F(hh, ii) = 1 - F(hh, ii);
  end
end
toc
% Elapsed time is 19.201042 seconds.

这些解决方案是等价的…

chck1 = C - D;
chck2 = C - E;
chck3 = C - F;
figure
plot(sort(chck1(:)))
figure
plot(sort(chck2(:)))
figure
plot(sort(chck3(:)))

…但显然，在内存和执行时间方面，具有部分向量化的方法在内存和执行时间方面更有效。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/20029235

复制

相似问题

问简化计算，因此可以使用矩阵运算来完成
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问简化计算，因此可以使用矩阵运算来完成EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问简化计算，因此可以使用矩阵运算来完成
EN