文章/答案/技术大牛

发布

社区首页 >问答首页 >多个随机过程的基于核条件增量

问多个随机过程的基于核条件增量
EN

Code Review用户

提问于 2019-08-05 22:38:52

回答 1查看 52关注 0票数 4

我写过这个函数是一个研究项目的一部分，它涉及分析随机过程中的时间序列数据。我们有一个小数目(从1到3)的独立观测标量时间序列。这些观测有不同的长度，每一个都包含关于10^4-10^5数据点的内容。nKBR_moments.m下面的函数将观察的单元数组与其他设置一起作为输入，并输出称为“条件增量矩”的统计量。这些变量是M1和M2。关于这一理论的更多细节，本研究论文概述了类似的方法。

为了研究的目的，功能最终将被评估数万次，在一台台式计算机上。在下面提供的测试脚本中，对此函数的一个评估大约需要3秒。对优化代码性能、内存使用或可伸缩性的想法表示赞赏。

MATLAB函数：

function [Xcentre,M1,M2] = nKBR_moments(X,tau_in,Npoints,xLims,h)
%Kernel based moments, n-data
%
%   Notes:
%   Calculates kernel based moments for a given stochastic time-series.
%   Uses Epanechnikov kernel with built in computational advantages. Uses
%   Nadaraya-Watson estimator. Calculates moments from n sources of data.
%   
%
%   Inputs:
%   - "X"                       Observed variables, cell array of data
%   - "tau_in"                  Time-shift indexes
%   - "Npoints"                 Number of evaluation points
%   - "xLims"                   Limits in upper and lower evaluation points
%   - "h"                       Bandwidth
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Processing
dX = (xLims(2)-xLims(1))/(Npoints-1); % Bins increment
Xcentre = xLims(1):dX:xLims(2); % Grid
heff = h*sqrt(5); % Effective bandwidth, for setting up bins
eta = floor(heff/dX+0.5); % Bandwidth for bins optimizing

% Epanechnikov kernel
K= @(u) 0*(u.^2>1)+3/4*(1-u.^2).*(u.^2<=1);
Ks = @(u) K(u/sqrt(5))/sqrt(5); % Silverman's definition of the kernel (Silverman, 1986)
Kh = @(u) Ks(u/h)/h; % Changing bandwidth

% Sort all data into bins
Bextend = dX*(eta+0.5); % Extend bins
edges = xLims(1)-Bextend:dX:xLims(2)+Bextend; % Edges
ndata = numel(X); % Number of data-sets
Xloc = cell(1,ndata); % Preallocate histogram location data
nXdata = cellfun(@numel,X); % Number of x data
key = 1:max(nXdata); % Data key
for nd = 1:ndata
    [~,~,Xloc{nd}] = histcounts(X{nd},edges); % Sort
end
Xbinloc = eta+(1:Npoints); % Bin locations
BinBeg = Xbinloc-eta; % Bin beginnings
BinEnd = Xbinloc+eta; % Bin beginnings

% Preallocate
Ntau = numel(tau_in); % Number of time-steps
[M1,M2] = deal(zeros(Ntau,Npoints)); % Moments
[iX,iXkey,XU,Khj,yinc,Khjt] = deal(cell(1,ndata)); % Preallocate increment data

% Pre calculate increments
inc = cell(Ntau,ndata);
for nd = 1:ndata
    poss_tau_ind = 1:nXdata(nd); % Possible time-shifts
    for tt = 1:Ntau
        tau_c = tau_in(tt); % Chosen shift
        tau_ind = poss_tau_ind(1+tau_c:end); % Chosen indices
        inc{tt,nd} = X{nd}(tau_ind) - X{nd}(tau_ind - tau_c);
    end
end

% Loop over evaluation points
for ii = 1:Npoints

    % Start and end bins
    kBinBeg = BinBeg(ii);
    kBinEnd = BinEnd(ii);

    % Data and weights
    for nd = 1:ndata
        iX{nd} = and(kBinBeg<=Xloc{nd},Xloc{nd}<=kBinEnd); % Data in bins
        iXkey{nd} = key(iX{nd}); % Data key
        XU{nd} = X{nd}(iX{nd}); % Unshifted data
        Khj{nd} = Kh(Xcentre(ii)-XU{nd}); % Weights
    end

    % For each shift
    for tt = 1:Ntau
        tau_c = tau_in(tt); % Chosen shift

        % Get data
        for nd = 1:ndata            
            XUin = iXkey{nd}; % Unshifted data indices
            XUin(XUin>nXdata(nd)-tau_c) = []; % Clip overflow
            yinc{nd} = inc{tt,nd}(XUin); % Increments
            Khjt{nd} = Khj{nd}(1:numel(yinc{nd})); % Clipped weight vector
        end

        % Concatenate data
        ytt = [yinc{:}];
        Khjtt = [Khjt{:}];

        % Increments and moments
        sumKhjtt = sum(Khjtt);
        M1(tt,ii) = sum(Khjtt.*ytt)/sumKhjtt;

        y2 = (ytt - M1(tt,ii)).^2; % Squared (with correction)
        M2(tt,ii) = sum(Khjtt.*y2)/sumKhjtt;
    end
end
end

MATLAB测试脚本(不需要反馈)：

%% nKBR_testing
clearvars,close all

%% Parameters

% Simulation settings
n_sims = 10; % Number of simulations
dt = 0.001; % Time-step
tend1 = 40; % Time-end, process 1
tend2 = 36; % Time-end, process 2
x0 = 0; % Start position
eta = 0; % Mean
D = 1; % Noise amplitude
gamma = 1; % Drift slope

% Analysis settings
tau_in = 1:60; % Time-shift indexes
Npoints = 50; % Number of evaluation points
xLims = [-1,1]; % Limits of evaluation
h = 0.5; % Kernel bandwidth

%% Simulating
t1 = 0:dt:tend1;
t2 = 0:dt:tend2;

% Realize an Ornstein Uhlenbeck process
rng('default')
ex1 = exp(-gamma*t1);
ex2 = exp(-gamma*t2);
x1 = x0*ex1 + eta*(1-ex1) + sqrt(D)*ex1.*cumsum(exp(gamma*t1).*[0,sqrt(2*dt)*randn(1,numel(t1)-1)]);
x2 = x0*ex2 + eta*(1-ex2) + sqrt(D)*ex2.*cumsum(exp(gamma*t2).*[0,sqrt(2*dt)*randn(1,numel(t2)-1)]);

%% Calculating and timing moments

tic
for ns = 1:n_sims
    [~,M1,M2] = nKBR_moments({x1,x2},tau_in,Npoints,xLims,h);
end
nKBR_moments_time = toc;
nKBR_average_time = nKBR_moments_time/n_sims

%% Plotting

figure
hold on,box on
plot(t1,x1)
plot(t2,x2)
xlabel('Time')
ylabel('Amplitude')
title('Two Ornstein-Uhlenbeck processes')

figure
subplot(1,2,1)
box on
plot(dt*tau_in,M1,'k')
xlabel('Time-shift, \tau')
title('M^{(1)}')
subplot(1,2,2)
box on
plot(dt*tau_in,M2,'k')
xlabel('Time-shift, \tau')
title('M^{(2)}')

测试脚本将创建两个类似于下面的图形。

performance

statistics

matlab

回答 1

Code Review用户

回答已采纳

发布于 2019-09-26 21:41:24

代码有点混乱，因为有大量非常短的变量名，尽管这些名称可能与您链接的文件中的等式相匹配，但我还没有看过。我将使用更长、描述性更强的变量名称(这些名称还将允许您减少注释的数量，这只能在更改代码时脱离同步)。除此之外，我认为它是直截了当的，很容易跟上。

在表现方面，我没有立即看到任何巨大的收益。也许可以将某些操作向量化，但机会并不是显而易见的。要了解如何提高速度，您可以做的第一件事是分析代码。MATLAB有一个内置的分析器，参见这里。

以下是一些可以简化的事情：

    for nd = 1:ndata
        iX{nd} = and(kBinBeg<=Xloc{nd},Xloc{nd}<=kBinEnd); % Data in bins
        iXkey{nd} = key(iX{nd}); % Data key
        XU{nd} = X{nd}(iX{nd}); % Unshifted data
        Khj{nd} = Kh(Xcentre(ii)-XU{nd}); % Weights
    end

在这篇文章中，iX{nd}从未在循环之外使用。XU{nd}也不是。让我们摆脱这些(更少的索引和更少的数据存储都会带来更快的代码)：

    for nd = 1:ndata
        iX = and(kBinBeg <= Xloc{nd}, Xloc{nd} <= kBinEnd); % Data in bins
        iXkey{nd} = key(iX);          % Data key
        XU = X{nd}(iX);               % Unshifted data
        Khj{nd} = Kh(Xcentre(ii)-XU); % Weights
    end

因为key = 1:max(nXdata)，key(iX)和find(iX)是一样的，所以后者可能更快。现在你有：

    for nd = 1:ndata
        iXkey{nd} = find((kBinBeg <= Xloc{nd}) & (Xloc{nd} <= kBinEnd));
        Khj{nd} = Kh(Xcentre(ii) - X{nd}(iXkey{nd}));
    end

这句话：

XUin(XUin>nXdata(nd)-tau_c) = [];

也可以写成：

XUin = XUin(XUin <= nXdata(nd)-tau_c);

这发生在最内部的循环中，因此是最常执行的行之一。这两个选项都值得一试，看看哪一种更快。

此位非常昂贵，因为它复制数据：

        ytt = [yinc{:}];
        Khjtt = [Khjt{:}];

        % Increments and moments
        sumKhjtt = sum(Khjtt);
        M1(tt,ii) = sum(Khjtt.*ytt)/sumKhjtt;

在这里查看循环是否比级联更快也是值得的：

        sumKhjtt = 0;
        sumKhjtt_ytt = 0;
        for nd = 1:ndata
           sumKhjtt = sumKhjtt + sum(Khjt{nd});
           sumKhjtt_ytt = sumKhjtt_ytt + sum(Khjt{nd}.*yinc{nd});
        end
        M1(tt,ii) = sumKhjtt_ytt / sumKhjtt;

(M2也是如此)。可以将此循环包含到前一个单元格数组中，以避免再出现两个单元格数组：yinc和Khjt：

        sumKhjtt = 0;
        sumKhjtt_ytt = 0;
        for nd = 1:ndata
            XUin = iXkey{nd};
            XUin(XUin > nXdata(nd)-tau_c) = [];
            yinc = inc{tt,nd}(XUin);
            Khjt = Khj{nd}(1:numel(yinc{nd}));
            sumKhjtt = sumKhjtt + sum(Khjt{nd});
            sumKhjtt_ytt = sumKhjtt_ytt + sum(Khjt{nd}.*yinc{nd});
        end
        M1(tt,ii) = sumKhjtt_ytt / sumKhjtt;

您必须重写这个方程，以便M2能够将它合并到那个循环中。这相当简单，类似于计算方差的朴素算法。这个算法是不稳定的，它取决于你的数据是否好的结果。如果不是，则使用韦尔福德在线算法代替。在那里，您可以计算第一批的二阶中心矩，然后添加到后续批的累加器中。

票数 2

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/225618

复制

相似问题

问多个随机过程的基于核条件增量
EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问多个随机过程的基于核条件增量EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问多个随机过程的基于核条件增量
EN