我要找的是那些平均收视率最高的俱乐部,根据票数和明显的最高收视率。
我所做的是:
club weighted median计算方法:对低票数的杂草俱乐部减去一个标准差问题是我无法确定为什么我的数据没有正确显示。我想我的算盘有问题。当我从0-5得到一个数值时,我得到的是10的数字,并且是负值(就像评分一样)。。
我不太清楚我的逻辑哪里失败了。
下面是我的评级代码逻辑:
$sql="SELECT SUM(rating) AS sumrating,COUNT(reviews.rating) AS countrating FROM reviews";
$rs=mysqli_fetch_array(mysqli_query($scx_dbh,$sql));
// get the total summation of ratings against all reviews
$ratingssum=(int)$rs['sumrating'];
// get the total number of ratings against all reviews
$ratingscount=(int)$rs['countrating'];
// get the population mediam
$mean = $ratingssum / $ratingscount;
// determine the variance of the population
$variance = 0;
$sql="SELECT rating AS score FROM reviews";
$rs=mysqli_query($scx_dbh,$sql);
while($row=mysqli_fetch_array($rs)){
$score = (int)$row['score'];
$variance += pow(($score-$mean),2);
}
$variance = $variance/$ratingscount;
// loop through all clubs and implement new rating
$scores=array();
$sql="SELECT locid,COUNT(reviewid) AS locationrecordcount,AVG(rating) AS locationmedian FROM reviews GROUP BY locid";
$rs=mysqli_query($scx_dbh,$sql);
/// begin loop
while($row=mysqli_fetch_array($rs)){
// get the number of review votes for this club
$numvotes=(int)$row['locationrecordcount'];
// get the location id
$locId = (int)$row['locid'];
// find the standard deviation for this club (total variance * numclubvotes)
$standarddev=sqrt($variance*$numvotes);
// create the new rating for this club with 1 standard deviation less
$oldRating=$row['locationmedian'];
$newRating=$oldRating-$standarddev;
$scores[$locId] = array(
'numvotes'=>$numvotes,
'standard-deviation'=>$standarddev,
'original-rating'=> $oldRating,
'weighted-rating'=>$newRating
);
}
usort($scores,function($a,$b){
return $a['weighted-rating']-$b['weighted-rating'];
});以下是我的研究结果:
前10名
[0] => Array
(
[numvotes] => 1121
[standard-deviation] => 68.898321138853
[original-rating] => 4.415700267618207
[weighted-rating] => -64.482620871235
)
[1] => Array
(
[numvotes] => 909
[standard-deviation] => 62.042283630954
[original-rating] => 3.1290979097910174
[weighted-rating] => -58.913185721163
)
[2] => Array
(
[numvotes] => 594
[standard-deviation] => 50.153247058093
[original-rating] => 4.414225589225589
[weighted-rating] => -45.739021468868
)
[3] => Array
(
[numvotes] => 505
[standard-deviation] => 46.243587892712
[original-rating] => 4.090099009900985
[weighted-rating] => -42.153488882811
)
[4] => Array
(
[numvotes] => 517
[standard-deviation] => 46.78979093937
[original-rating] => 4.661025145067699
[weighted-rating] => -42.128765794302
)
[5] => Array
(
[numvotes] => 505
[standard-deviation] => 46.243587892712
[original-rating] => 3.2117821782178173
[weighted-rating] => -43.031805714494
)
[6] => Array
(
[numvotes] => 398
[standard-deviation] => 41.053233483774
[original-rating] => 4.231155778894469
[weighted-rating] => -36.822077704879
)
[7] => Array
(
[numvotes] => 340
[standard-deviation] => 37.944190471069
[original-rating] => 3.9102941176470547
[weighted-rating] => -34.033896353422
)
[8] => Array
(
[numvotes] => 323
[standard-deviation] => 36.983422110177
[original-rating] => 3.261145510835913
[weighted-rating] => -33.722276599341
)
[9] => Array
(
[numvotes] => 280
[standard-deviation] => 34.433791770728
[original-rating] => 3.36767857142857
[weighted-rating] => -31.066113199299
)
[10] => Array
(
[numvotes] => 254
[standard-deviation] => 32.796136967109
[original-rating] => 3.1411417322834665
[weighted-rating] => -29.654995234825
)最差10
[232] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 4.95
[weighted-rating] => 2.0398134378534
)
[233] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 5
[weighted-rating] => 2.0898134378534
)
[234] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 4
[weighted-rating] => 1.9421873473882
)
[235] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 4.8
[weighted-rating] => 1.8898134378534
)
[236] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 3.25
[weighted-rating] => 1.1921873473882
)
[237] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[238] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[239] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 4.1
[weighted-rating] => 2.0421873473882
)
[240] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[241] => Array
(
[numvotes] => 2
[standard-deviation] => 2.9101865621466
[original-rating] => 5
[weighted-rating] => 2.0898134378534
))
更新
好的,我重新计算了standard deviation对整个种群的影响。是2.0578126526118。
以下是我的当前代码:
$sql="SELECT SUM(reviews.rating) AS sumrating,COUNT(reviews.rating) AS countrating FROM reviews";
$rs=mysqli_fetch_array(mysqli_query($scx_dbh,$sql));
$ratingssum=(int)$rs['sumrating'];
$ratingscount=(int)$rs['countrating'];
$mean = $ratingssum / $ratingscount;
$variance = 0;
$sql="SELECT rating AS score FROM reviews";
$rs=mysqli_query($scx_dbh,$sql);
while($row=mysqli_fetch_array($rs)){
$score = (int)$row['score'];
$variance += pow(($score-$mean),2);
}
$variance = $variance/$ratingscount;
$standarddev=sqrt($variance);
$scores=array();
$sql="SELECT locid,COUNT(reviewid) AS locationrecordcount,AVG(rating) AS locationmedian FROM reviews GROUP BY locid";
$rs=mysqli_query($scx_dbh,$sql);
while($row=mysqli_fetch_array($rs)){
$numvotes=(int)$row['locationrecordcount'];
$locId = (int)$row['locid'];
$oldRating=$row['locationmedian'];
$newRating=$oldRating-$standarddev;
$scores[$locId] = array(
'numvotes'=>$numvotes,
'standard-deviation'=>$standarddev,
'original-rating'=> $oldRating,
'weighted-rating'=>$newRating
);
}
usort($scores,function($a,$b){
return (int)($a['weighted-rating']-$b['weighted-rating']);
});1./我认为我的排序功能不正确。在使用我的排序函数进行排序之后,以下是前5位:
[0] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.2
[weighted-rating] => -1.8578126526118
)
[1] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.05
[weighted-rating] => -2.0078126526118
)
[2] => Array
(
[numvotes] => 4
[standard-deviation] => 2.0578126526118
[original-rating] => 0.7625
[weighted-rating] => -1.2953126526118
)
[3] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.1
[weighted-rating] => -1.9578126526118
)
[4] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 0.4
[weighted-rating] => -1.6578126526118
)可以看到,除了产生负数外,位置1(索引0)的weighted-average是-1.85,位置2(索引1)是-2.00。我想,我的算法或我的代码or else why are there negative numbers being sorted as first中的排序函数都有问题。
而且,当俱乐部有1票的时候,我会让他们站在第1位。这个算法的目的是清除这些俱乐部,这样我就可以专注于拥有1000张选票的俱乐部。
以下是倒数第五位:
[237] => Array
(
[numvotes] => 29
[standard-deviation] => 2.0578126526118
[original-rating] => 4.112068965517241
[weighted-rating] => 2.0542563129054
)
[238] => Array
(
[numvotes] => 5
[standard-deviation] => 2.0578126526118
[original-rating] => 3.8800000000000003
[weighted-rating] => 1.8221873473882
)
[239] => Array
(
[numvotes] => 31
[standard-deviation] => 2.0578126526118
[original-rating] => 3.7499999999999996
[weighted-rating] => 1.6921873473882
)
[240] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 5
[weighted-rating] => 2.9421873473882
)
[241] => Array
(
[numvotes] => 1
[standard-deviation] => 2.0578126526118
[original-rating] => 4.45
[weighted-rating] => 2.3921873473882
)同样的行为出现在底部5。位置5(指数241)为weighted-average of 2.39,位置4(索引240)为weighted-average 2.94。
发布于 2015-11-09 16:56:59
标准差是按变化的平方根计算的,而不是用变化的平方根乘以人口(数字)计算的:
// find the standard deviation for this club (total variance)
$standarddev=sqrt($variance);如果你想自己衡量每个俱乐部,那么你需要计算每个俱乐部的变化(和标准差)。要做到这一点,你需要的只是每个俱乐部的票数之和,而不是所有的选票,然后计算变化和标准差。那么,所有选票的变异和标准差似乎都是不必要的。
更新:
你想要完成的事情(用很少的选票淘汰俱乐部)不能用标准差(σ)来完成。
接待员:
5/1=5, (5-5)^2 / 1=0, sqrt(0)=01/1=1, (1-1)^2 / 1=0, sqrt(0)=010/2=5, ((5-5)^2 + (5-5)^2) / 2=0, sqrt(0)=0现在你会认为你可以用低σ来清除那些俱乐部。
6/2=3, ((1-3)^2 + (5-3)^2) / 2=8, sqrt(8)=2.83正如你所看到的,这里没有说“嘿,这个俱乐部获得了很多选票”。σ说的唯一一件事是选票的差距有多大。如果没有或小的传播(变化),那么σ将是0或小,反之亦然。
你可以尝试的是看看俱乐部σ(Cσ)和总σ(Tσ)之间的区别。如果这个值接近0(我们假设为0.1),那么你就会知道,俱乐部和整个人群都有类似的变化。但这仍不能保证至少有x票。这种计算方式类似于abs(Cσ - Tσ) < 0.1。
关于您的排序功能:
usort期望返回的整数为-1、0或1才能正常工作。当你开始减去负数时,你会得到相当奇怪的结果。正确的排序函数应该如下所示:
usort($scores, function cmp($a, $b)
{
if ($a['weighted-rating'] == $b['weighted-rating']) {
return 0;
}
return ($a['weighted-rating'] < $b['weighted-rating']) ? -1 : 1;
}发布于 2015-11-09 17:00:11
$standarddev=sqrt($variance*$numvotes);应该是
$standarddev=sqrt($variance);编辑
你的问题是在你的逻辑中找不到错误。原因是你有一个很大的复杂函数。您应该研究测试驱动的开发,并将您的代码分成小的、易于测试的工作单元。对于每个工作单元,您可以测试不同输入值的预期输出。这样,就可以更容易地排除代码的部分,例如stdCalculator,因为该部分包含在一系列测试用例中。
https://stackoverflow.com/questions/33613891
复制相似问题