Python OpenCV和基于内容的图像检索-有没有一种方法可以在不下载图像的情况下使用在线图像数据库
我正在尝试构建一个CBIR系统,最近我用Python编写了一个程序,使用OpenCV函数可以查询本地图像数据库并返回结果(见下文)。我现在需要将其与另一个网络抓取模块(使用Scrapy)连接起来,在该模块中,我在线输出约1000个图像链接。这些图像分散在整个web上,应该输入到第一个OpenCV模块。是否可以在不下载此联机图像集的情况下对其执行计算 以下是我为OpenCV模块所遵循的步骤 1) 定义基于区域的彩色图像描述符 2) 从数据集提取要素(索引)(数据集作为命令行参数传递) 3) 相似度量的定义Python OpenCV和基于内容的图像检索-有没有一种方法可以在不下载图像的情况下使用在线图像数据库,python,image,opencv,scrapy,cbir,Python,Image,Opencv,Scrapy,Cbir,我正在尝试构建一个CBIR系统,最近我用Python编写了一个程序,使用OpenCV函数可以查询本地图像数据库并返回结果(见下文)。我现在需要将其与另一个网络抓取模块(使用Scrapy)连接起来,在该模块中,我在线输出约1000个图像链接。这些图像分散在整个web上,应该输入到第一个OpenCV模块。是否可以在不下载此联机图像集的情况下对其执行计算 以下是我为OpenCV模块所遵循的步骤 1) 定义基于区域的彩色图像描述符 2) 从数据集提取要素(索引)(数据集作为命令行参数传递) 3) 相似度
# import the necessary packages
import numpy as np
import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')
import csv
class Searcher:
def __init__(self, indexPath):
# store our index path
self.indexPath = indexPath
def search(self, queryFeatures, limit = 5):
# initialize our dictionary of results
results = {}
# open the index file for reading
with open(self.indexPath) as f:
# initialize the CSV reader
reader = csv.reader(f)
# loop over the rows in the index
for row in reader:
# parse out the image ID and features, then compute the
# chi-squared distance between the features in our index
# and our query features
features = [float(x) for x in row[1:]]
d = self.chi2_distance(features, queryFeatures)
# now that we have the distance between the two feature
# vectors, we can udpate the results dictionary -- the
# key is the current image ID in the index and the
# value is the distance we just computed, representing
# how 'similar' the image in the index is to our query
results[row[0]] = d
# close the reader
f.close()
# sort our results, so that the smaller distances (i.e. the
# more relevant images are at the front of the list)
results = sorted([(v, k) for (k, v) in results.items()])
# return our (limited) results
return results[:limit]
def chi2_distance(self, histA, histB, eps = 1e-10):
# compute the chi-squared distance
d = 0.5 * np.sum([((a - b) ** 2) / (a + b + eps)
for (a, b) in zip(histA, histB)])
# return the chi-squared distance
return d
`
4) 执行实际搜索
# import the necessary packages
from colordescriptor import ColorDescriptor
from searcher import Searcher
import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')
import argparse
import cv2
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--index", required = True,
help = "Path to where the computed index will be stored")
ap.add_argument("-q", "--query", required = True,
help = "Path to the query image")
ap.add_argument("-r", "--result-path", required = True,
help = "Path to the result path")
args = vars(ap.parse_args())
# initialize the image descriptor
cd = ColorDescriptor((8, 12, 3))
# load the query image and describe it
query = cv2.imread(args["query"])
features = cd.describe(query)
# perform the search
searcher = Searcher(args["index"])
results = searcher.search(features)
# display the query
cv2.imshow("Query", query)
# loop over the results
for (score, resultID) in results:
# load the result image and display it
result = cv2.imread(args["result_path"] + "/" + resultID)
cv2.imshow("Result", result)
cv2.waitKey(0)
最后一个命令行命令是:
python search.py --index index.csv --query query.png --result-path dataset
其中index.csv是步骤2之后在图像数据库上生成的文件。query.png是我的查询图像,dataset是包含~100幅图像的文件夹
那么,是否可以修改索引,使我不需要本地数据集,并且可以直接从URL列表中进行查询和索引?为了清楚起见:您想在网站上搜索图像,然后在不同时下载所有图像的情况下运行openCV?是的。我的代码浏览了一些网站并返回了~1000个图像的URL存储到temp不是一个下载过程?如何处理
无主的文件?这肯定有帮助。但无论如何,我决定不在url上运行opencv模块,因为我的数据集非常大。延迟和下载时间太长,所以我愿意用磁盘空间换取更快的搜索
python search.py --index index.csv --query query.png --result-path dataset