我试图证明\rm MSE=Var+Bias^2的相等性,但显然我错了,因为它们在我的计算中不相等:
下面是一个例子。我用蒙特卡罗来估计这个积分:
这个积分的值是1。假设样本是由均匀概率分布计算的,我的估计是:
估计量的方差可解析地计算为:\textrm{Var}(\langle I\rangle )= \frac1N\int _0^1(5x^4-1) ^2~\mathrm dx=\frac{16}{9N}
这是估计量的方差,对吗?其中,在每次迭代之前,我将其计算为:
在守则中:
float Ie = sum / (i + 1); //estimator
float avg2 = sum2 / (i + 1);
float var = avg2 - (Ie * Ie);
var /= i + 1;这种偏见很简单:
float bias2 = (trueValue - Ie);
bias2 *= bias2;以下是完整的代码:
#include <iostream>
const int nsamp = 100;
int main()
{
float trueValue = 1;
float data[nsamp];
float sum = 0;
float sum2=0;
float mse = 0;
float sqErr=0;
for (int i = 0; i < nsamp; i++)
{
float x = rand1();
data[i] = 5 * x*x*x*x;
sum += data[i];
sum2 += data[i] * data[i];
float Ie = sum / (i + 1); //estimator
float avg2 = sum2 / (i + 1);
float var = avg2 - (Ie * Ie);
var /= i + 1;
float bias2 = (trueValue - Ie);
bias2 *= bias2;
sqErr += (trueValue - Ie) * (trueValue - Ie);
mse = sqErr / (i+1);
printf("\nI=%f Var=%f Bias2=%f MSE=%f", Ie, var, bias2, mse);
}
}以及mse不等于var+bias2的输出:
I=0.000000 Var=0.000000 Bias2=1.000000 MSE=1.000000
I=0.252220 Var=0.031807 Bias2=0.559176 MSE=0.779588
I=0.170473 Var=0.018592 Bias2=0.688114 MSE=0.749097
I=0.662600 Var=0.192099 Bias2=0.113839 MSE=0.590282
I=0.647206 Var=0.123133 Bias2=0.124464 MSE=0.497118
I=0.583528 Var=0.088888 Bias2=0.173449 MSE=0.443174
I=0.510921 Var=0.069824 Bias2=0.239198 MSE=0.414034
.
.
.
I=0.984586 Var=0.001662 Bias2=0.000238 MSE=0.005417
I=0.983600 Var=0.001659 Bias2=0.000269 MSE=0.005411
I=0.982616 Var=0.001657 Bias2=0.000302 MSE=0.005406
I=0.985189 Var=0.001660 Bias2=0.000219 MSE=0.005401
I=0.984248 Var=0.001658 Bias2=0.000248 MSE=0.005396
I=0.983362 Var=0.001655 Bias2=0.000277 MSE=0.005391发布于 2022-09-16 13:41:03
除非我误解了您的代码,否则您似乎同时使用Ie作为您的估计器和估计值的平均值。
这里的Ie应该是平均的:
float bias2 = (trueValue - Ie); bias2 \*= bias2;
但是这里的Ie应该是估计值:
sqErr += (trueValue - Ie) \* (trueValue - Ie); mse = sqErr / (i+1);
如果您将代码更改为类似的代码,您应该会看到预期的结果(对于任何语法错误表示歉意,这是从我最初使用的Python代码中转录出来的):
#include <iostream>
const int nsamp = 100;
int main()
{
float trueValue = 1;
float data[nsamp];
float sum_ = 0;
float sum2 = 0;
float mse = 0;
float sqErr = 0;
float ts = 0;
for i in range(nsamp):
float x = rand1();
x = 5 * x*x*x*x;
sum_ += x;
Ie = sum_ / (i + 1);
data[i] = Ie;
float ts += Ie;
float sum2 += Ie * Ie;
float avg = ts / (i + 1);
float var = sum2 / (i + 1) - avg ** 2;
float bias2 = trueValue - avg;
bias2 *= bias2
sqErr += (trueValue - Ie) * (trueValue - Ie);
mse = sqErr / (i + 1);
printf(f""\nI=%f Var=%f Bias2=%f MSE=%f", Ie, var, bias2, mse");
}
}https://datascience.stackexchange.com/questions/113893
复制相似问题