文章/答案/技术大牛

发布

社区首页 >问答首页 >匹配对象的算法

问匹配对象的算法
EN

Stack Overflow用户

提问于 2014-10-10 12:55:14

回答 2查看 573关注 0票数 0

我有1000个对象，每个对象有4个属性列表:单词、图像、音频文件和视频文件的列表。

我想比较每一个对象与：

一个物体，Ox，从1,000。
所有其他的东西。

比较如下:sum( common+图像中的单词在common+.中)。

我想要一个算法，帮助我找到最近的5，比如说，对象与Ox和(一个不同的？)查找最接近的5对对象的算法

我研究过聚类分析和最大匹配，它们似乎不完全适合这种情况。如果存在更合适的方法，我不想使用这些方法，那么对于任何人来说，这看起来像是一种特定类型的算法吗?或者，有人能指出正确的方向来应用我提到的算法吗？

python

algorithm

pattern-matching

cluster-analysis

data-mining

回答 2

Stack Overflow用户

回答已采纳

发布于 2014-10-10 13:40:06

我做了一个例子程序来解决你的第一个问题。但是你必须实现你想要比较的图像，音频和视频。我假设每个对象对所有列表都有相同的长度。要回答你的第二个问题，这将是类似的，但有一个双循环。

import numpy as np
from random import randint

class Thing:

    def __init__(self, words, images, audios, videos):
        self.words  = words
        self.images = images
        self.audios = audios
        self.videos = videos

    def compare(self, other):
        score = 0
        # Assuming the attribute lists have the same length for both objects
        # and that they are sorted in the same manner:
        for i in range(len(self.words)):
            if self.words[i] == other.words[i]:
                score += 1
        for i in range(len(self.images)):
            if self.images[i] == other.images[i]:
                score += 1
        # And so one for audio and video. You have to make sure you know
        # what method to use for determining when an image/audio/video are
        # equal.
        return score


N = 1000
things = []
words  = np.random.randint(5, size=(N,5))
images = np.random.randint(5, size=(N,5))
audios = np.random.randint(5, size=(N,5))
videos = np.random.randint(5, size=(N,5))
# For testing purposes I assign each attribute to a list (array) containing
# five random integers. I don't know how you actually intend to do it.
for i in xrange(N):
    things.append(Thing(words[i], images[i], audios[i], videos[i]))

# I will assume that object number 999 (i=999) is the Ox:
ox = 999
scores = np.zeros(N - 1)
for i in xrange(N - 1):
    scores[i] = (things[ox].compare(things[i]))

best = np.argmax(scores)
print "The most similar thing is thing number %d." % best
print
print "Ox attributes:"
print things[ox].words
print things[ox].images
print things[ox].audios
print things[ox].videos
print
print "Best match attributes:"
print things[ox].words
print things[ox].images
print things[ox].audios
print things[ox].videos

编辑：

现在，这是同一个程序，修改了第二个问题的回答。结果很简单。我只需要增加4行：

将scores转换为(N，N)数组，而不是(N)。
添加for j in xrange(N):，从而创建一个双循环。
if i == j:
break

其中3和4.只是为了确保我只对每对事物进行一次而不是两次比较，而不是将任何东西与自己进行比较。

然后，还需要几行代码来提取scores中5个最大值的索引。我也修改了印刷，这样就很容易用眼睛来确认印刷对实际上是非常相似的。

下面是新代码：

import numpy as np

class Thing:

    def __init__(self, words, images, audios, videos):
        self.words  = words
        self.images = images
        self.audios = audios
        self.videos = videos

    def compare(self, other):
        score = 0
        # Assuming the attribute lists have the same length for both objects
        # and that they are sorted in the same manner:
        for i in range(len(self.words)):
            if self.words[i] == other.words[i]:
                score += 1
        for i in range(len(self.images)):
            if self.images[i] == other.images[i]:
                score += 1
        for i in range(len(self.audios)):
            if self.audios[i] == other.audios[i]:
                score += 1
        for i in range(len(self.videos)):
            if self.videos[i] == other.videos[i]:
                score += 1
        # You have to make sure you know what method to use for determining
        # when an image/audio/video are equal.
        return score


N = 1000
things = []
words  = np.random.randint(5, size=(N,5))
images = np.random.randint(5, size=(N,5))
audios = np.random.randint(5, size=(N,5))
videos = np.random.randint(5, size=(N,5))
# For testing purposes I assign each attribute to a list (array) containing
# five random integers. I don't know how you actually intend to do it.
for i in xrange(N):
    things.append(Thing(words[i], images[i], audios[i], videos[i]))


################################################################################
############################# This is the new part: ############################
################################################################################
scores = np.zeros((N, N))
# Scores will become a triangular matrix where scores[i, j]=value means that
# value is the number of attrributes thing[i] and thing[j] have in common.
for i in xrange(N):
    for j in xrange(N):
        if i == j:
            break
            # Break the loop here because:
            # * When i==j we would compare thing[i] with itself, and we don't
            #   want that.
            # * For every combination where j>i we would repeat all the
            #   comparisons for j<i and create duplicates. We don't want that.
        scores[i, j] = (things[i].compare(things[j]))

# I want the 5 most similar pairs:
n = 5
# This list will contain a tuple for each of the n most similar pairs:
best_list = []
for k in xrange(n):
    ij = np.argmax(scores) # Returns a single integer: ij = i*n + j
    i = ij / N
    j = ij % N
    best_list.append((i, j))
    # Erease this score so that on next iteration the second largest score
    # is found:
    scores[i, j] = 0

for k, (i, j) in enumerate(best_list):
    # The number 1 most similar pair is the BEST match of all.
    # The number N most similar pair is the WORST match of all.
    print "The number %d most similar pair is thing number %d and %d." \
          % (k+1, i, j)
    print "Thing%4d:" % i, \
          things[i].words, things[i].images, things[i].audios, things[i].videos
    print "Thing%4d:" % j, \
          things[j].words, things[j].images, things[j].audios, things[j].videos
    print

票数 1

Stack Overflow用户

发布于 2014-10-10 16:11:15

如果您的比较适用于“创建所有特性的和并找到最接近和的特性”，那么有一个简单的技巧可以获得接近对象：

将所有对象放入数组中
计算所有的总和
按和对数组进行排序。

如果使用任何索引，那么靠近它的对象现在也会有一个关闭索引。因此，要找到最近的5个对象，只需查看排序数组中的index+5到index-5。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/26299978

复制

相似问题

问匹配对象的算法
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问匹配对象的算法EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问匹配对象的算法
EN