我正在尝试通过制作一条直线(感知器)f并使一边的点+1和另一边的点-1来训练数据点。然后创建一条新的直线g,并尝试通过更新w= w+ y(t)x(t)来使其尽可能接近f,其中w是权重,y(t)是+1,-1,x(t)是未分类的点的坐标。在实现这个tho之后,我没有得到从g到f的很好的拟合。这里是我的代码和一些示例输出。
import random
random.seed()
points = [ [1, random.randint(-25, 25), random.randint(-25,25), 0] for k in range(1000)]
weights = [.1,.1,.1]
misclassified = []
############################################################# Function f
interceptf = (0,random.randint(-5,5))
slopef = (random.randint(-10, 10),random.randint(-10,10))
point1f = ((interceptf[0] + slopef[0]),(interceptf[1] + slopef[1]))
point2f = ((interceptf[0] - slopef[0]),(interceptf[1] - slopef[1]))
############################################################# Function G starting
interceptg = (-weights[0],weights[2])
slopeg = (-weights[1],weights[2])
point1g = ((interceptg[0] + slopeg[0]),(interceptg[1] + slopeg[1]))
point2g = ((interceptg[0] - slopeg[0]),(interceptg[1] - slopeg[1]))
#############################################################
def isLeft(a, b, c):
return ((b[0] - a[0])*(c[1] - a[1]) - (b[1] - a[1])*(c[0] - a[0])) > 0
for i in points:
if isLeft(point1f,point2f,i):
i[3]=1
else:
i[3]=-1
for i in points:
if (isLeft(point1g,point2g,i)) and (i[3] == -1):
misclassified.append(i)
if (not isLeft(point1g,point2g,i)) and (i[3] == 1):
misclassified.append(i)
print len(misclassified)
while misclassified:
first = misclassified[0]
misclassified.pop(0)
a = [first[0],first[1],first[2]]
b = first[3]
a[:] = [x*b for x in a]
weights = [(x + y) for x, y in zip(weights,a)]
interceptg = (-weights[0],weights[2])
slopeg = (-weights[1],weights[2])
point1g = ((interceptg[0] + slopeg[0]),(interceptg[1] + slopeg[1]))
point2g = ((interceptg[0] - slopeg[0]),(interceptg[1] - slopeg[1]))
check = 0
for i in points:
if (isLeft(point1g,point2g,i)) and (i[3] == -1):
check += 1
if (not isLeft(point1g,point2g,i)) and (i[3] == 1):
check += 1
print weights
print check117 <-带有g的原始未命中分类数
-116.9,-300.9,190.1 <-最终权重
617 <-算法后使用g分类的原始未命中数
956 <-带有g的原始未命中分类数
-33.9%,-12769.9,-572.9 <-最终权重
461 <-算法后使用g分类的原始未命中数
发布于 2013-09-04 14:11:08
你的算法至少有几个问题:
(y(i)-p(i))x(i)形式的更新规则,其中p(i)是预测标签,y(i)是真标签(但是如果你只更新misclassifieds),这显然会降低你的方法的性能
https://stackoverflow.com/questions/18604542
复制相似问题