以python/linux的方式比较两个图像
试图解决防止上传重复图像的问题 我有两个JPG。看着它们,我可以看出它们实际上是一模一样的。但由于某些原因,它们有不同的文件大小(一个是从备份中提取的,另一个是另一个上载),因此它们有不同的md5校验和 我怎样才能有效而自信地比较两幅图像,就像人类能够看到它们明显相同一样 示例:和 更新 我写了这个剧本:以python/linux的方式比较两个图像,python,linux,image,Python,Linux,Image,试图解决防止上传重复图像的问题 我有两个JPG。看着它们,我可以看出它们实际上是一模一样的。但由于某些原因,它们有不同的文件大小(一个是从备份中提取的,另一个是另一个上载),因此它们有不同的md5校验和 我怎样才能有效而自信地比较两幅图像,就像人类能够看到它们明显相同一样 示例:和 更新 我写了这个剧本: import math, operator from PIL import Image def compare(file1, file2): image1 = Image.open(f
import math, operator
from PIL import Image
def compare(file1, file2):
image1 = Image.open(file1)
image2 = Image.open(file2)
h1 = image1.histogram()
h2 = image2.histogram()
rms = math.sqrt(reduce(operator.add,
map(lambda a,b: (a-b)**2, h1, h2))/len(h1))
return rms
if __name__=='__main__':
import sys
file1, file2 = sys.argv[1:]
print compare(file1, file2)
然后我下载了两个视觉上完全相同的图像并运行脚本。输出:
58.9830484122
有人能告诉我什么是合适的截止值吗
更新II
a.jpg和b.jpg之间的区别在于,第二个已使用PIL保存:
b=Image.open('a.jpg')
b.save(open('b.jpg','wb'))
这显然应用了一些非常轻的质量修改。我现在已经解决了我的问题,将相同的PIL保存应用到正在上载的文件,而不做任何操作,现在它可以工作了 有一个OSS项目,它使用WebDriver拍摄屏幕截图,然后比较图像以查看是否存在任何问题()。它通过将文件打开到流中,然后比较每一位来实现 你也许可以做一些类似的事情 编辑: 经过更多的研究,我发现
h1 = Image.open("image1").histogram()
h2 = Image.open("image2").histogram()
rms = math.sqrt(reduce(operator.add,
map(lambda a,b: (a-b)**2, h1, h2))/len(h1))
和我想你应该解码图像,逐像素比较,看看它们是否合理相似 使用PIL和Numpy,您可以非常轻松地完成:
import Image
import numpy
import sys
def main():
img1 = Image.open(sys.argv[1])
img2 = Image.open(sys.argv[2])
if img1.size != img2.size or img1.getbands() != img2.getbands():
return -1
s = 0
for band_index, band in enumerate(img1.getbands()):
m1 = numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size)
m2 = numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size)
s += numpy.sum(numpy.abs(m1-m2))
print s
if __name__ == "__main__":
sys.exit(main())
这将为您提供一个数值,如果图像完全相同,该数值应非常接近0
请注意,移动/旋转的图像将被报告为非常不同,因为像素不会一一匹配。您可以使用(遍历图片的像素/片段并进行比较)对其进行比较,或者如果您正在寻找完全相同的副本比较,请尝试比较两个文件的MD5哈希。首先,我应该注意到它们不完全相同;b已重新压缩并失去质量。如果你在一个好的显示器上仔细观察,你可以看到这一点 <> P>为了确定它们是主观的“相同的”,你必须做一些类似FORTRAN的建议,尽管你必须任意地建立一个“同一性”的阈值,使S与图像的大小无关,并且更明智地处理通道,我会考虑做RMS(均方根)。两幅图像像素之间的颜色空间欧几里德距离。我现在没有时间写代码,但基本上每个像素都需要计算
(R_2 - R_1) ** 2 + (G_2 - G_1) ** 2 + (B_2 - B_1) ** 2
,加入
(A_2-A_1)**2
如果图像具有alpha通道等,则结果是两个图像之间颜色空间距离的平方。求所有像素的平均值,然后取结果标量的平方根。然后为该值确定一个合理的阈值
或者,您可能会认为使用不同有损压缩的同一原始图像的副本并不真正“相同”,并坚持使用文件哈希。了解图像的某些特征比其他特征更重要的原因是一个完整的科学程序。我会根据您想要的解决方案提出一些备选方案:
- 如果您的问题是查看JPEG中是否有位翻转,那么请尝试对差异图像进行成像(可能在本地进行了小的编辑?)
- 要查看图像是否全局相同,请使用Kullback-Leibler距离比较直方图
- 要查看您是否有一些定性更改,在应用其他答案之前,请使用以下功能过滤您的图像,以提高高级频率的重要性:
使用ImageMagick,您只需在shell中使用[或通过操作系统库从程序中调用]
compare image1 image2 output
这将创建一个带有差异标记的输出图像
compare -metric AE -fuzz 5% image1 image2 output
将为您提供5%的模糊系数,以忽略较小的像素差异。
可以从获取更多信息。我测试了这个,它是所有方法中最好的,速度极快
def rmsdiff_1997(im1, im2):
"Calculate the root-mean-square difference between two images"
h = ImageChops.difference(im1, im2).histogram()
# calculate rms
return math.sqrt(reduce(operator.add,
map(lambda h, i: h*(i**2), h, range(256))
) / (float(im1.size[0]) * im1.size[1]))
作为参考我尝试了上述3种方法和其他方法。 图像比较主要有两种类型:逐像素比较和直方图比较 我已经尝试了这两种方法,像素一确实失败了100%,实际上应该是这样的,就像我们将第二个图像移动1个像素,所有像素都将不匹配,我们将100%不匹配 但直方图比较在理论上应该很有效,但事实并非如此 这两幅图像的视角略有改变,直方图看起来99%相似,但算法产生的结果显示“非常不同” 4种不同的算法结果:
- 完美匹配:假
- 像素差:115816402
- 直方图比较:83.69564286668303
- 历史比较:1744.8160719686186
- 完美匹配:假
- 像素差:207893096
- 组织学比较:104.30194643642095
- 历史比较:6875.766716148522
from PIL import Image
from PIL import ImageChops
from functools import reduce
import numpy
import sys
import math
import operator
# Just checking if images are 100% the same
def equal(im1, im2):
img1 = Image.open(im1)
img2 = Image.open(im2)
return ImageChops.difference(img1, img2).getbbox() is None
def histCompare(im1, im2):
h1 = Image.open(im1).histogram()
h2 = Image.open(im2).histogram()
rms = math.sqrt(reduce(operator.add, map(lambda a, b: (a - b)**2, h1, h2)) / len(h1))
return rms
# To get a measure of how similar two images are, we calculate the root-mean-square (RMS)
# value of the difference between the images. If the images are exactly identical,
# this value is zero. The following function uses the difference function,
# and then calculates the RMS value from the histogram of the resulting image.
def rmsdiff_1997(im1, im2):
#"Calculate the root-mean-square difference between two images"
img1 = Image.open(im1)
img2 = Image.open(im2)
h = ImageChops.difference(img1, img2).histogram()
# calculate rms
return math.sqrt(reduce(operator.add,
map(lambda h, i: h * (i**2), h, range(256))
) / (float(img1.size[0]) * img1.size[1]))
# Pixel by pixel comparison to see if images are reasonably similar.
def countDiff(im1, im2):
s = 0
img1 = Image.open(im1)
img2 = Image.open(im2)
if img1.size != img2.size or img1.getbands() != img2.getbands():
return -1
for band_index, band in enumerate(img1.getbands()):
m1 = numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size)
m2 = numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size)
s += numpy.sum(numpy.abs(m1 - m2))
return s
print("[Same Image]")
print("Perfect match:", equal("data/start.jpg", "data/start.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/start.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/start.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/start.jpg"))
print("\n[Same Position]")
print("Perfect match:", equal("data/start.jpg", "data/end.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end.jpg"))
print("\n[~5º off]")
print("Perfect match:", equal("data/start.jpg", "data/end2.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end2.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end2.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end2.jpg"))
print("\n[~15º off]")
print("Perfect match:", equal("data/start.jpg", "data/end3.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end3.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end3.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end3.jpg"))
print("\n[100% different]")
print("Perfect match:", equal("data/start.jpg", "data/end4.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end4.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end4.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end4.jpg"))
仅使用PIL等
def rmsdiff_1997(im1, im2):
"Calculate the root-mean-square difference between two images"
h = ImageChops.difference(im1, im2).histogram()
# calculate rms
return math.sqrt(reduce(operator.add,
map(lambda h, i: h*(i**2), h, range(256))
) / (float(im1.size[0]) * im1.size[1]))
from PIL import Image
from PIL import ImageChops
from functools import reduce
import numpy
import sys
import math
import operator
# Just checking if images are 100% the same
def equal(im1, im2):
img1 = Image.open(im1)
img2 = Image.open(im2)
return ImageChops.difference(img1, img2).getbbox() is None
def histCompare(im1, im2):
h1 = Image.open(im1).histogram()
h2 = Image.open(im2).histogram()
rms = math.sqrt(reduce(operator.add, map(lambda a, b: (a - b)**2, h1, h2)) / len(h1))
return rms
# To get a measure of how similar two images are, we calculate the root-mean-square (RMS)
# value of the difference between the images. If the images are exactly identical,
# this value is zero. The following function uses the difference function,
# and then calculates the RMS value from the histogram of the resulting image.
def rmsdiff_1997(im1, im2):
#"Calculate the root-mean-square difference between two images"
img1 = Image.open(im1)
img2 = Image.open(im2)
h = ImageChops.difference(img1, img2).histogram()
# calculate rms
return math.sqrt(reduce(operator.add,
map(lambda h, i: h * (i**2), h, range(256))
) / (float(img1.size[0]) * img1.size[1]))
# Pixel by pixel comparison to see if images are reasonably similar.
def countDiff(im1, im2):
s = 0
img1 = Image.open(im1)
img2 = Image.open(im2)
if img1.size != img2.size or img1.getbands() != img2.getbands():
return -1
for band_index, band in enumerate(img1.getbands()):
m1 = numpy.array([p[band_index] for p in img1.getdata()]).reshape(*img1.size)
m2 = numpy.array([p[band_index] for p in img2.getdata()]).reshape(*img2.size)
s += numpy.sum(numpy.abs(m1 - m2))
return s
print("[Same Image]")
print("Perfect match:", equal("data/start.jpg", "data/start.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/start.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/start.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/start.jpg"))
print("\n[Same Position]")
print("Perfect match:", equal("data/start.jpg", "data/end.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end.jpg"))
print("\n[~5º off]")
print("Perfect match:", equal("data/start.jpg", "data/end2.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end2.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end2.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end2.jpg"))
print("\n[~15º off]")
print("Perfect match:", equal("data/start.jpg", "data/end3.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end3.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end3.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end3.jpg"))
print("\n[100% different]")
print("Perfect match:", equal("data/start.jpg", "data/end4.jpg"))
print("Pixel difference:", countDiff("data/start.jpg", "data/end4.jpg"))
print("Histogram Comparison:", rmsdiff_1997("data/start.jpg", "data/end4.jpg"))
print("HistComparison:", histCompare("data/start.jpg", "data/end4.jpg"))
import math, operator
from PIL import Image
from PIL import ImageChops
def images_are_similar(img1, img2, error=90):
diff = ImageChops.difference(img1, img2).histogram()
sq = (value * (i % 256) ** 2 for i, value in enumerate(diff))
sum_squares = sum(sq)
rms = math.sqrt(sum_squares / float(img1.size[0] * img1.size[1]))
# Error is an arbitrary value, based on values when
# comparing 2 rotated images & 2 different images.
return rms < error
img1 = Image.open(img1_path)
img2 = Image.open(img2_path)
import binascii
with open('pic1.png', 'rb') as f:
content1 = f.read()
with open('pic2.png', 'rb') as f:
content2 = f.read()
if content1 == content2:
print("same")
else:
print("not same")