Python 目标匹配算法
我有1000个对象,每个对象有4个属性列表:单词、图像、音频文件和视频文件的列表 我想将每个对象与以下对象进行比较:Python 目标匹配算法,python,algorithm,pattern-matching,cluster-analysis,data-mining,Python,Algorithm,Pattern Matching,Cluster Analysis,Data Mining,我有1000个对象,每个对象有4个属性列表:单词、图像、音频文件和视频文件的列表 我想将每个对象与以下对象进行比较: 一个物体,牛,从1000 其他任何物体 比较将类似于: 总和(常用单词+常用图像+…) 我想要一个算法,可以帮助我找到最接近的5个对象,比如说,到Ox的对象,以及(另一个?)找到最接近的5对对象的算法 我研究过聚类分析和最大匹配,但它们似乎并不完全适合这种情况。如果存在更合适的方法,我不想使用这些方法,那么这对任何人来说都像是一种特殊类型的算法吗,或者有人能给我指出正确的方向来应
我研究过聚类分析和最大匹配,但它们似乎并不完全适合这种情况。如果存在更合适的方法,我不想使用这些方法,那么这对任何人来说都像是一种特殊类型的算法吗,或者有人能给我指出正确的方向来应用我提到的算法吗?我制作了一个示例程序来解决你的第一个问题。但是,如果你想比较图像、音频和视频,就必须实现ho。我假设每个对象对于所有列表都有相同的长度。要回答你的第二个问题,它可能是类似的,但有一个双循环
import numpy as np
from random import randint
class Thing:
def __init__(self, words, images, audios, videos):
self.words = words
self.images = images
self.audios = audios
self.videos = videos
def compare(self, other):
score = 0
# Assuming the attribute lists have the same length for both objects
# and that they are sorted in the same manner:
for i in range(len(self.words)):
if self.words[i] == other.words[i]:
score += 1
for i in range(len(self.images)):
if self.images[i] == other.images[i]:
score += 1
# And so one for audio and video. You have to make sure you know
# what method to use for determining when an image/audio/video are
# equal.
return score
N = 1000
things = []
words = np.random.randint(5, size=(N,5))
images = np.random.randint(5, size=(N,5))
audios = np.random.randint(5, size=(N,5))
videos = np.random.randint(5, size=(N,5))
# For testing purposes I assign each attribute to a list (array) containing
# five random integers. I don't know how you actually intend to do it.
for i in xrange(N):
things.append(Thing(words[i], images[i], audios[i], videos[i]))
# I will assume that object number 999 (i=999) is the Ox:
ox = 999
scores = np.zeros(N - 1)
for i in xrange(N - 1):
scores[i] = (things[ox].compare(things[i]))
best = np.argmax(scores)
print "The most similar thing is thing number %d." % best
print
print "Ox attributes:"
print things[ox].words
print things[ox].images
print things[ox].audios
print things[ox].videos
print
print "Best match attributes:"
print things[ox].words
print things[ox].images
print things[ox].audios
print things[ox].videos
编辑:
现在,这是同一个程序稍加修改,以回答您的第二个问题。结果很简单。我基本上只需要添加4行:
得分
更改为(N,N)数组,而不仅仅是(N):
,从而创建一个双循环如果i==j:
break
分数中5个最大值的索引。我还重新设计了印刷品,这样就很容易用肉眼确认印刷品实际上非常相似
下面是新代码:
import numpy as np
class Thing:
def __init__(self, words, images, audios, videos):
self.words = words
self.images = images
self.audios = audios
self.videos = videos
def compare(self, other):
score = 0
# Assuming the attribute lists have the same length for both objects
# and that they are sorted in the same manner:
for i in range(len(self.words)):
if self.words[i] == other.words[i]:
score += 1
for i in range(len(self.images)):
if self.images[i] == other.images[i]:
score += 1
for i in range(len(self.audios)):
if self.audios[i] == other.audios[i]:
score += 1
for i in range(len(self.videos)):
if self.videos[i] == other.videos[i]:
score += 1
# You have to make sure you know what method to use for determining
# when an image/audio/video are equal.
return score
N = 1000
things = []
words = np.random.randint(5, size=(N,5))
images = np.random.randint(5, size=(N,5))
audios = np.random.randint(5, size=(N,5))
videos = np.random.randint(5, size=(N,5))
# For testing purposes I assign each attribute to a list (array) containing
# five random integers. I don't know how you actually intend to do it.
for i in xrange(N):
things.append(Thing(words[i], images[i], audios[i], videos[i]))
################################################################################
############################# This is the new part: ############################
################################################################################
scores = np.zeros((N, N))
# Scores will become a triangular matrix where scores[i, j]=value means that
# value is the number of attrributes thing[i] and thing[j] have in common.
for i in xrange(N):
for j in xrange(N):
if i == j:
break
# Break the loop here because:
# * When i==j we would compare thing[i] with itself, and we don't
# want that.
# * For every combination where j>i we would repeat all the
# comparisons for j<i and create duplicates. We don't want that.
scores[i, j] = (things[i].compare(things[j]))
# I want the 5 most similar pairs:
n = 5
# This list will contain a tuple for each of the n most similar pairs:
best_list = []
for k in xrange(n):
ij = np.argmax(scores) # Returns a single integer: ij = i*n + j
i = ij / N
j = ij % N
best_list.append((i, j))
# Erease this score so that on next iteration the second largest score
# is found:
scores[i, j] = 0
for k, (i, j) in enumerate(best_list):
# The number 1 most similar pair is the BEST match of all.
# The number N most similar pair is the WORST match of all.
print "The number %d most similar pair is thing number %d and %d." \
% (k+1, i, j)
print "Thing%4d:" % i, \
things[i].words, things[i].images, things[i].audios, things[i].videos
print "Thing%4d:" % j, \
things[j].words, things[j].images, things[j].audios, things[j].videos
print
将numpy导入为np
课程内容:
定义初始化(自我、文字、图像、音频、视频):
self.words=单词
self.images=图像
self.audios=audios
self.videos=视频
def比较(自身、其他):
分数=0
#假设两个对象的属性列表长度相同
#并以相同的方式对其进行排序:
对于范围内的i(len(self.words)):
如果self.words[i]==其他.words[i]:
分数+=1
对于范围内的i(len(self.images)):
如果self.images[i]==其他.images[i]:
分数+=1
对于范围内的i(len(self.audios)):
如果self.audios[i]==其他.audios[i]:
分数+=1
对于范围内的i(len(self.videos)):
如果self.videos[i]==其他.videos[i]:
分数+=1
#你必须确保你知道用什么方法来确定
#当图像/音频/视频相等时。
回击得分
N=1000
事物=[]
words=np.random.randint(5,size=(N,5))
images=np.random.randint(5,size=(N,5))
audios=np.random.randint(5,大小=(N,5))
videos=np.random.randint(5,大小=(N,5))
#出于测试目的,我将每个属性分配给一个包含
#五个随机整数。我不知道你到底打算怎么做。
对于x范围内的i(N):
附加(事物(单词[i]、图像[i]、音频[i]、视频[i]))
################################################################################
#############################这是新的部分:############################
################################################################################
分数=np.零((N,N))
#分数将变成一个三角形矩阵,其中分数[i,j]=值意味着
#value是thing[i]和thing[j]共有的属性数。
对于x范围内的i(N):
对于X范围内的j(N):
如果i==j:
打破
#在这里打破循环,因为:
#*当i==j时,我们会将事物[i]与自身进行比较,但我们不会
#我想要那个。
#*对于j>i的每个组合,我们将重复所有
#j的比较如果您的比较使用“创建所有特征的总和,并找到最接近总和的特征”,那么有一个简单的技巧可以获得接近的对象:
将所有对象放入数组中
计算所有的总数
按和对数组进行排序
如果您使用任何索引,那么靠近它的对象现在也将有一个闭合索引。因此,要找到5个最接近的对象,只需查看排序数组中的index+5
到index-5
。两个图像何时相同?当它们使用汉明距离具有类似的稳健散列时。如果这个答案是您想到的,我可以修改它以找到最近的5对对象。@schoon没问题。这对你来说够了吗?还是我应该扩展它来完整地回答第二个问题?@schoon我现在已经编辑了答案,并添加了第二部分。制作这个之后一个月,我发现我自己的工作中需要这个算法。加倍值得!