Python OpenCV和基于内容的图像检索-有没有一种方法可以在不下载图像的情况下使用在线图像数据库_Python_Image_Opencv_Scrapy_Cbir

Python OpenCV和基于内容的图像检索-有没有一种方法可以在不下载图像的情况下使用在线图像数据库

python image opencv scrapy

Python OpenCV和基于内容的图像检索-有没有一种方法可以在不下载图像的情况下使用在线图像数据库,python,image,opencv,scrapy,cbir,Python,Image,Opencv,Scrapy,Cbir,我正在尝试构建一个CBIR系统，最近我用Python编写了一个程序，使用OpenCV函数可以查询本地图像数据库并返回结果（见下文）。我现在需要将其与另一个网络抓取模块（使用Scrapy）连接起来，在该模块中，我在线输出约1000个图像链接。这些图像分散在整个web上，应该输入到第一个OpenCV模块。是否可以在不下载此联机图像集的情况下对其执行计算以下是我为OpenCV模块所遵循的步骤 1）定义基于区域的彩色图像描述符 2）从数据集提取要素（索引）（数据集作为命令行参数传递） 3）相似度

我正在尝试构建一个CBIR系统，最近我用Python编写了一个程序，使用OpenCV函数可以查询本地图像数据库并返回结果（见下文）。我现在需要将其与另一个网络抓取模块（使用Scrapy）连接起来，在该模块中，我在线输出约1000个图像链接。这些图像分散在整个web上，应该输入到第一个OpenCV模块。是否可以在不下载此联机图像集的情况下对其执行计算

以下是我为OpenCV模块所遵循的步骤

1）定义基于区域的彩色图像描述符

2）从数据集提取要素（索引）（数据集作为命令行参数传递）

3）相似度量的定义

# import the necessary packages
import numpy as np
import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')
import csv

class Searcher:
    def __init__(self, indexPath):
        # store our index path
        self.indexPath = indexPath

    def search(self, queryFeatures, limit = 5):
        # initialize our dictionary of results
        results = {}

        # open the index file for reading
        with open(self.indexPath) as f:
            # initialize the CSV reader
            reader = csv.reader(f)

            # loop over the rows in the index
            for row in reader:
                # parse out the image ID and features, then compute the
                # chi-squared distance between the features in our index
                # and our query features
                features = [float(x) for x in row[1:]]
                d = self.chi2_distance(features, queryFeatures)

                # now that we have the distance between the two feature
                # vectors, we can udpate the results dictionary -- the
                # key is the current image ID in the index and the
                # value is the distance we just computed, representing
                # how 'similar' the image in the index is to our query
                results[row[0]] = d

            # close the reader
            f.close()

        # sort our results, so that the smaller distances (i.e. the
        # more relevant images are at the front of the list)
        results = sorted([(v, k) for (k, v) in results.items()])

        # return our (limited) results
        return results[:limit]

    def chi2_distance(self, histA, histB, eps = 1e-10):
        # compute the chi-squared distance
        d = 0.5 * np.sum([((a - b) ** 2) / (a + b + eps)
            for (a, b) in zip(histA, histB)])

        # return the chi-squared distance
        return d

4）执行实际搜索

# import the necessary packages
from colordescriptor import ColorDescriptor
from searcher import Searcher
import sys
sys.path.append('/usr/local/lib/python2.7/site-packages')
import argparse
import cv2

# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--index", required = True,
    help = "Path to where the computed index will be stored")
ap.add_argument("-q", "--query", required = True,
    help = "Path to the query image")
ap.add_argument("-r", "--result-path", required = True,
    help = "Path to the result path")
args = vars(ap.parse_args())

# initialize the image descriptor
cd = ColorDescriptor((8, 12, 3))

# load the query image and describe it
query = cv2.imread(args["query"])
features = cd.describe(query)

# perform the search
searcher = Searcher(args["index"])
results = searcher.search(features)

# display the query
cv2.imshow("Query", query)

# loop over the results
for (score, resultID) in results:
    # load the result image and display it
    result = cv2.imread(args["result_path"] + "/" + resultID)
    cv2.imshow("Result", result)
    cv2.waitKey(0)

最后一个命令行命令是：

python search.py --index index.csv --query query.png --result-path dataset

其中index.csv是步骤2之后在图像数据库上生成的文件。query.png是我的查询图像，dataset是包含~100幅图像的文件夹

那么，是否可以修改索引，使我不需要本地数据集，并且可以直接从URL列表中进行查询和索引？

为了清楚起见：您想在网站上搜索图像，然后在不同时下载所有图像的情况下运行openCV？是的。我的代码浏览了一些网站并返回了~1000个图像的URL存储到temp不是一个下载过程？如何处理

无主的文件？这肯定有帮助。但无论如何，我决定不在url上运行opencv模块，因为我的数据集非常大。延迟和下载时间太长，所以我愿意用磁盘空间换取更快的搜索
python search.py --index index.csv --query query.png --result-path dataset