Python 目标匹配算法_Python_Algorithm_Pattern Matching_Cluster Analysis_Data Mining

Python 目标匹配算法

python algorithm

Python 目标匹配算法,python,algorithm,pattern-matching,cluster-analysis,data-mining,Python,Algorithm,Pattern Matching,Cluster Analysis,Data Mining,我有1000个对象，每个对象有4个属性列表：单词、图像、音频文件和视频文件的列表我想将每个对象与以下对象进行比较：一个物体，牛，从1000 其他任何物体比较将类似于：总和（常用单词+常用图像+…）我想要一个算法，可以帮助我找到最接近的5个对象，比如说，到Ox的对象，以及（另一个？）找到最接近的5对对象的算法我研究过聚类分析和最大匹配，但它们似乎并不完全适合这种情况。如果存在更合适的方法，我不想使用这些方法，那么这对任何人来说都像是一种特殊类型的算法吗，或者有人能给我指出正确的方向来应

我有1000个对象，每个对象有4个属性列表：单词、图像、音频文件和视频文件的列表

我想将每个对象与以下对象进行比较：

一个物体，牛，从1000

其他任何物体

比较将类似于：总和（常用单词+常用图像+…）

我想要一个算法，可以帮助我找到最接近的5个对象，比如说，到Ox的对象，以及（另一个？）找到最接近的5对对象的算法

我研究过聚类分析和最大匹配，但它们似乎并不完全适合这种情况。如果存在更合适的方法，我不想使用这些方法，那么这对任何人来说都像是一种特殊类型的算法吗，或者有人能给我指出正确的方向来应用我提到的算法吗？

我制作了一个示例程序来解决你的第一个问题。但是，如果你想比较图像、音频和视频，就必须实现ho。我假设每个对象对于所有列表都有相同的长度。要回答你的第二个问题，它可能是类似的，但有一个双循环

import numpy as np
from random import randint

class Thing:

    def __init__(self, words, images, audios, videos):
        self.words  = words
        self.images = images
        self.audios = audios
        self.videos = videos

    def compare(self, other):
        score = 0
        # Assuming the attribute lists have the same length for both objects
        # and that they are sorted in the same manner:
        for i in range(len(self.words)):
            if self.words[i] == other.words[i]:
                score += 1
        for i in range(len(self.images)):
            if self.images[i] == other.images[i]:
                score += 1
        # And so one for audio and video. You have to make sure you know
        # what method to use for determining when an image/audio/video are
        # equal.
        return score


N = 1000
things = []
words  = np.random.randint(5, size=(N,5))
images = np.random.randint(5, size=(N,5))
audios = np.random.randint(5, size=(N,5))
videos = np.random.randint(5, size=(N,5))
# For testing purposes I assign each attribute to a list (array) containing
# five random integers. I don't know how you actually intend to do it.
for i in xrange(N):
    things.append(Thing(words[i], images[i], audios[i], videos[i]))

# I will assume that object number 999 (i=999) is the Ox:
ox = 999
scores = np.zeros(N - 1)
for i in xrange(N - 1):
    scores[i] = (things[ox].compare(things[i]))

best = np.argmax(scores)
print "The most similar thing is thing number %d." % best
print
print "Ox attributes:"
print things[ox].words
print things[ox].images
print things[ox].audios
print things[ox].videos
print
print "Best match attributes:"
print things[ox].words
print things[ox].images
print things[ox].audios
print things[ox].videos

编辑：

现在，这是同一个程序稍加修改，以回答您的第二个问题。结果很简单。我基本上只需要添加4行：

将

得分

更改为（N，N）数组，而不仅仅是（N）

为xrange（N）中的j添加

：

，从而创建一个双循环

如果i==j:

break

其中3。四,。只是为了确保我只对每一对事物进行一次比较，而不是两次比较，并且不将任何事物与它们自己进行比较

然后还有几行代码需要提取

分数中5个最大值的索引。我还重新设计了印刷品，这样就很容易用肉眼确认印刷品实际上非常相似
下面是新代码：
import numpy as np

class Thing:

    def __init__(self, words, images, audios, videos):
        self.words  = words
        self.images = images
        self.audios = audios
        self.videos = videos

    def compare(self, other):
        score = 0
        # Assuming the attribute lists have the same length for both objects
        # and that they are sorted in the same manner:
        for i in range(len(self.words)):
            if self.words[i] == other.words[i]:
                score += 1
        for i in range(len(self.images)):
            if self.images[i] == other.images[i]:
                score += 1
        for i in range(len(self.audios)):
            if self.audios[i] == other.audios[i]:
                score += 1
        for i in range(len(self.videos)):
            if self.videos[i] == other.videos[i]:
                score += 1
        # You have to make sure you know what method to use for determining
        # when an image/audio/video are equal.
        return score


N = 1000
things = []
words  = np.random.randint(5, size=(N,5))
images = np.random.randint(5, size=(N,5))
audios = np.random.randint(5, size=(N,5))
videos = np.random.randint(5, size=(N,5))
# For testing purposes I assign each attribute to a list (array) containing
# five random integers. I don't know how you actually intend to do it.
for i in xrange(N):
    things.append(Thing(words[i], images[i], audios[i], videos[i]))


################################################################################
############################# This is the new part: ############################
################################################################################
scores = np.zeros((N, N))
# Scores will become a triangular matrix where scores[i, j]=value means that
# value is the number of attrributes thing[i] and thing[j] have in common.
for i in xrange(N):
    for j in xrange(N):
        if i == j:
            break
            # Break the loop here because:
            # * When i==j we would compare thing[i] with itself, and we don't
            #   want that.
            # * For every combination where j>i we would repeat all the
            #   comparisons for j<i and create duplicates. We don't want that.
        scores[i, j] = (things[i].compare(things[j]))

# I want the 5 most similar pairs:
n = 5
# This list will contain a tuple for each of the n most similar pairs:
best_list = []
for k in xrange(n):
    ij = np.argmax(scores) # Returns a single integer: ij = i*n + j
    i = ij / N
    j = ij % N
    best_list.append((i, j))
    # Erease this score so that on next iteration the second largest score
    # is found:
    scores[i, j] = 0

for k, (i, j) in enumerate(best_list):
    # The number 1 most similar pair is the BEST match of all.
    # The number N most similar pair is the WORST match of all.
    print "The number %d most similar pair is thing number %d and %d." \
          % (k+1, i, j)
    print "Thing%4d:" % i, \
          things[i].words, things[i].images, things[i].audios, things[i].videos
    print "Thing%4d:" % j, \
          things[j].words, things[j].images, things[j].audios, things[j].videos
    print

将numpy导入为np
课程内容：
定义初始化（自我、文字、图像、音频、视频）：
self.words=单词
self.images=图像
self.audios=audios
self.videos=视频
def比较（自身、其他）：
分数=0
#假设两个对象的属性列表长度相同
#并以相同的方式对其进行排序：
对于范围内的i（len（self.words））：
如果self.words[i]==其他.words[i]：
分数+=1
对于范围内的i（len（self.images））：
如果self.images[i]==其他.images[i]：
分数+=1
对于范围内的i（len（self.audios））：
如果self.audios[i]==其他.audios[i]：
分数+=1
对于范围内的i（len（self.videos））：
如果self.videos[i]==其他.videos[i]：
分数+=1
#你必须确保你知道用什么方法来确定
#当图像/音频/视频相等时。
回击得分
N=1000
事物=[]
words=np.random.randint（5，size=（N，5））
images=np.random.randint（5，size=（N，5））
audios=np.random.randint（5，大小=（N，5））
videos=np.random.randint（5，大小=（N，5））
#出于测试目的，我将每个属性分配给一个包含
#五个随机整数。我不知道你到底打算怎么做。
对于x范围内的i（N）：
附加（事物（单词[i]、图像[i]、音频[i]、视频[i]））
################################################################################
#############################这是新的部分：############################
################################################################################
分数=np.零（（N，N））
#分数将变成一个三角形矩阵，其中分数[i，j]=值意味着
#value是thing[i]和thing[j]共有的属性数。
对于x范围内的i（N）：
对于X范围内的j（N）：
如果i==j：
打破
#在这里打破循环，因为：
#*当i==j时，我们会将事物[i]与自身进行比较，但我们不会
#我想要那个。
#*对于j>i的每个组合，我们将重复所有
#j的比较如果您的比较使用“创建所有特征的总和，并找到最接近总和的特征”，那么有一个简单的技巧可以获得接近的对象：
将所有对象放入数组中
计算所有的总数
按和对数组进行排序
如果您使用任何索引，那么靠近它的对象现在也将有一个闭合索引。因此，要找到5个最接近的对象，只需查看排序数组中的index+5
到index-5
。
两个图像何时相同？当它们使用汉明距离具有类似的稳健散列时。如果这个答案是您想到的，我可以修改它以找到最近的5对对象。@schoon没问题。这对你来说够了吗？还是我应该扩展它来完整地回答第二个问题？@schoon我现在已经编辑了答案，并添加了第二部分。制作这个之后一个月，我发现我自己的工作中需要这个算法。加倍值得！