我研究了如何计算样本的加权标准差或方差,并发现这篇文章参考了Gatz和Smith提出的几种方法:https://math.stackexchange.com/questions/823125/sampling-error-with-weighted-mean。
现在我试着理解Matlab在使用"std(A,w)“时是如何计算加权方差的。我使用"edit“查看了Matlab和原始代码,但未能理解如何计算加权方差的语法。有人能给我写下这个方程式,或者用几句话来描述下面代码的关键行吗?(在Matlab中键入"edit“即可找到函数的完整代码)。
% Weighted variance否则
if ~isvector(w) || ~isreal(w) || ~isfloat(w) || ...
(omitnan && ~all(w(~isnan(w)) >= 0)) || (~omitnan && ~all(w >= 0))
error(message('MATLAB:var:invalidWgts'));
end
if numel(w) ~= n
if isscalar(w)
error(message('MATLAB:var:invalidWgts'));
else
error(message('MATLAB:var:invalidSizeWgts'));
end
end
if ~omitnan
% Normalize W, and embed it in the right number of dims. Then
% replicate it out along the non-working dims to match X's size.
wresize = ones(1,max(ndims(x),dim)); wresize(dim) = n;
w = reshape(w ./ sum(w), wresize);
y = sum(w .* abs(x - sum(w .* x, dim)).^2, dim); % abs guarantees a real result
else
% Repeat vector W, such that new W has the same size as X
sz = size(x); sz(end+1:dim) = 1;
wresize = ones(size(sz)); wresize(dim) = sz(dim);
wtile = sz; wtile(dim) = 1;
w = repmat(reshape(w, wresize), wtile);
% Count up non-NaN weights at non-NaN elements
w(isnan(x)) = NaN;
denom = sum(w, dim, 'omitnan'); % contains no NaN, since w >= 0
x = x - (sum(w .* x, dim, 'omitnan') ./ denom);
wx2 = w .* abs(x).^2;
y = sum(wx2, dim, 'omitnan') ./ denom; % abs guarantees a real result
% Don't omit NaNs caused by computation (not missing data)
ind = any(isnan(wx2) & ~isnan(w), dim);
y(ind) = NaN;发布于 2021-12-07 20:14:17
所以,我相信刚才提到的方程式有点不正确。
sqrt(sum((x-mean(x)).^2.*w(:))/sum(w)) % slightly off需要根据数据的长度对数据进行加权,以匹配MATLAB的std()输出,并匹配以下基本STD方程:
sqrt(sum((x(:)-mean(x)).^2)/(length(x)-1)) % base STD equation通过计算长度,我们得到以下信息:
sqrt(sum((x(:)-mean(x)).^2.*w(:))/sum(w)*length(x)/(length(x)-1)) % corrected下面的示例演示了这些差异:
% define weighted std functions
std_fcn_a = @(x,w) sqrt(sum((x(:)-mean(x)).^2.*w(:))/sum(w));
std_fcn_b = @(x,w) sqrt(sum((x(:)-mean(x)).^2.*w(:))/sum(w)*length(x)/(length(x)-1));
% create data
x = reshape(magic(3),[1,9]); % create example data
w0 = ones(size(x))/length(x); % create uniform weighting
rng(0); % seed random data
w1 = rand(size(x)); % create random weights
w1 = w1/sum(w1); % normalize weights
% test unweighted
std_0 = std(x) % 2.7386
std_0_v2 = std(x,w0) % 2.5820
std_a_0 = std_fcn_a(x,w0) % 2.5820
std_b_0 = std_fcn_b(x,w0) % 2.7386
% test weighted
std_a_1 = std_fcn_a(x,w1) % 2.6963
std_b_1 = std_fcn_b(x,w1) % 2.8599示例输出
std_0 = 2.7386
std_0_v2 = 2.5820
std_a_0 = 2.5820
std_b_0 = 2.7386
std_a_1 = 2.6963
std_b_1 = 2.8599未加权检验结果表明,如果w是一致的,则std(x)与std(x,w)是不一样的。我最初假设它应该是,因为std(x)将有一个统一的加权,因此将是相等的。这就引出了我自己的问题和std(x,w)的有效性。
https://stackoverflow.com/questions/57569693
复制相似问题