Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/359.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 增加图像中文本行之间的间距_Python_Opencv_Image Processing_Text_Computer Vision - Fatal编程技术网

Python 增加图像中文本行之间的间距

Python 增加图像中文本行之间的间距,python,opencv,image-processing,text,computer-vision,Python,Opencv,Image Processing,Text,Computer Vision,我有一个输入图像的一段文字在单行间距。我正在尝试实现类似于行距选项的功能,以增加/减少Microsoft Word中文本行之间的间距。当前图像在单空间中,如何将文本转换为双空间?或者说.5空格?从本质上说,我试图动态地重新构造文本行之间的间距,最好是使用一个可调整的参数。大概是这样的: 输入图像 期望结果 我目前的尝试是这样的。我已经能够稍微增加间距,但文本细节似乎被侵蚀了,行间有随机噪音 关于如何改进代码或更好的方法有什么想法吗 import numpy as np import cv

我有一个输入图像的一段文字在单行间距。我正在尝试实现类似于行距选项的功能,以增加/减少Microsoft Word中文本行之间的间距。当前图像在单空间中,如何将文本转换为双空间?或者说
.5
空格?从本质上说,我试图动态地重新构造文本行之间的间距,最好是使用一个可调整的参数。大概是这样的:

输入图像

期望结果

我目前的尝试是这样的。我已经能够稍微增加间距,但文本细节似乎被侵蚀了,行间有随机噪音

关于如何改进代码或更好的方法有什么想法吗

import numpy as np 
import cv2

img = cv2.imread('text.png')
H, W = img.shape[:2]
grey = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
threshed = cv2.threshold(grey, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

hist = cv2.reduce(threshed, 1, cv2.REDUCE_AVG).reshape(-1)
spacing = 2
delimeter = [y for y in range(H - 1) if hist[y] <= spacing < hist[y + 1]]
arr = []
y_prev, y_curr = 0, 0
for y in delimeter:
    y_prev = y_curr
    y_curr = y
    arr.append(threshed[y_prev:y_curr, 0:W])

arr.append(threshed[y_curr:H, 0:W])
space_array = np.zeros((10, W))
result = np.zeros((1, W))

for im in arr:
    v = np.concatenate((space_array, im), axis=0)
    result = np.concatenate((result, v), axis=0)

result = (255 - result).astype(np.uint8)
cv2.imshow('result', result)
cv2.waitKey()
将numpy导入为np
进口cv2
img=cv2.imread('text.png')
H、 W=图像形状[:2]
灰色=cv2.CVT颜色(img,cv2.COLOR\U BGR2GRAY)
阈值=cv2.阈值(灰色,0,255,cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
hist=cv2.减少(脱粒,1,cv2.减少平均值)。重塑(-1)
间距=2
delimeter=[y代表范围内的y(H-1),如果hist[y]接近#1:像素分析
  • 获取二值图像。加载图像,转换为灰度和大津阈值

  • 对行像素求和。其思想是,行的像素和可以用来确定它是对应于文本还是空白

  • 创建新图像并添加额外的空白。我们迭代像素阵列并添加额外的空白


  • 二值图像

    # Load image, grayscale, Otsu's threshold
    image = cv2.imread('1.png')
    h, w = image.shape[:2]
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    

    现在,我们遍历每一行并对白色像素求和以生成像素数组。 我们可以分析由每行中所有像素之和生成的一列数据,以确定哪些行对应于文本。等于
    0
    的数据部分表示由空白组成的图像行。以下是数据数组的可视化:

    # Sum white pixels in each row
    # Create blank space array and and final image 
    pixels = np.sum(thresh, axis=1).tolist()
    space = np.ones((2, w), dtype=np.uint8) * 255
    result = np.zeros((1, w), dtype=np.uint8)
    

    我们将数据转换为一个列表,并对数据进行迭代以构建最终图像。如果一行被确定为空白,则我们将一个空白数组连接到最终图像。通过调整空白数组的大小,我们可以更改要添加到图像中的空间量

    # Iterate through each row and add space if entire row is empty
    # otherwise add original section of image to final image
    for index, value in enumerate(pixels):
        if value == 0:
            result = np.concatenate((result, space), axis=0)
        row = gray[index:index+1, 0:w]
        result = np.concatenate((result, row), axis=0)
    
    这是结果

    代码

    方法#2:单个行提取 对于更动态的方法,我们可以找到每条线的轮廓,然后在每个轮廓之间添加空间。我们使用与第一种方法相同的方法添加额外的空白

  • 获取二值图像。加载图像、灰度、高斯模糊和大津阈值

  • 连接文本轮廓。我们创建一个水平形状的内核,并进行扩展,将每行的单词连接成一个轮廓

  • 提取每条线轮廓。我们找到轮廓,使用
    imtuils.contours.sort\u contours()从上到下排序,然后提取每条线

  • 在每条线之间添加空白。我们创建一个空数组,并通过在每条线轮廓之间添加空白来构建新图像


  • 二值图像

    # Load image, grayscale, Otsu's threshold
    image = cv2.imread('1.png')
    h, w = image.shape[:2]
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    

    创建水平内核并扩展

    提取的单独线条轮廓以绿色高亮显示

    在每行之间添加空白。下面是使用
    1
    像素宽的空格数组的结果

    结果为
    5
    像素宽的空间阵列

    完整代码

    import cv2
    import numpy as np 
    from imutils import contours
    
    # Load image, grayscale, blur, Otsu's threshold
    image = cv2.imread('1.png')
    original = image.copy()
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (3,3), 0)
    thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    invert = 255 - thresh  
    height, width = image.shape[:2]
    
    # Dilate with a horizontal kernel to connect text contours
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10,2))
    dilate = cv2.dilate(thresh, kernel, iterations=2)
    
    # Extract each line contour
    lines = []
    cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    (cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
    for c in cnts:
        x,y,w,h = cv2.boundingRect(c)
        cv2.rectangle(image, (0, y), (width, y+h), (36,255,12), 2)
        line = original[y:y+h, 0:width]
        line = cv2.cvtColor(line, cv2.COLOR_BGR2GRAY)
        lines.append(line)
    
    # Append white space in between each line
    space = np.ones((1, width), dtype=np.uint8) * 255
    result = np.zeros((0, width), dtype=np.uint8)
    result = np.concatenate((result, space), axis=0)
    for line in lines:
        result = np.concatenate((result, line), axis=0)
        result = np.concatenate((result, space), axis=0)
    
    cv2.imshow('result', result)
    cv2.imshow('image', image)
    cv2.imshow('dilate', dilate)
    cv2.waitKey()
    

    你可以水平模糊或使用水平核的形态学。阈值。然后得到每个区域的轮廓,这些轮廓应该与文本行相对应。然后提取它们,并在新的干净背景图像上以更大的间距写入它们。一个微不足道的误称——你所谓的直方图实际上不是直方图。它是一个c由每行中所有像素的总和生成的数据列。(总和类似于将所有列平均为一列,即块大小调整)。直方图是一种概率分布;它显示计数与可能值。您正在计算计数与列中的位置。但是,解决方案非常好!谢谢,您说得对,它不是直方图。它是像素行数据的总和。我已更新了postGreat解释。!!
    # Dilate with a horizontal kernel to connect text contours
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10,2))
    dilate = cv2.dilate(thresh, kernel, iterations=2)
    
    # Extract each line contour
    lines = []
    cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    (cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
    for c in cnts:
        x,y,w,h = cv2.boundingRect(c)
        cv2.rectangle(image, (0, y), (width, y+h), (36,255,12), 2)
        line = original[y:y+h, 0:width]
        line = cv2.cvtColor(line, cv2.COLOR_BGR2GRAY)
        lines.append(line)
    
    # Append white space in between each line
    space = np.ones((1, width), dtype=np.uint8) * 255
    result = np.zeros((0, width), dtype=np.uint8)
    result = np.concatenate((result, space), axis=0)
    for line in lines:
        result = np.concatenate((result, line), axis=0)
        result = np.concatenate((result, space), axis=0)
    
    import cv2
    import numpy as np 
    from imutils import contours
    
    # Load image, grayscale, blur, Otsu's threshold
    image = cv2.imread('1.png')
    original = image.copy()
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (3,3), 0)
    thresh = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    invert = 255 - thresh  
    height, width = image.shape[:2]
    
    # Dilate with a horizontal kernel to connect text contours
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10,2))
    dilate = cv2.dilate(thresh, kernel, iterations=2)
    
    # Extract each line contour
    lines = []
    cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
    cnts = cnts[0] if len(cnts) == 2 else cnts[1]
    (cnts, _) = contours.sort_contours(cnts, method="top-to-bottom")
    for c in cnts:
        x,y,w,h = cv2.boundingRect(c)
        cv2.rectangle(image, (0, y), (width, y+h), (36,255,12), 2)
        line = original[y:y+h, 0:width]
        line = cv2.cvtColor(line, cv2.COLOR_BGR2GRAY)
        lines.append(line)
    
    # Append white space in between each line
    space = np.ones((1, width), dtype=np.uint8) * 255
    result = np.zeros((0, width), dtype=np.uint8)
    result = np.concatenate((result, space), axis=0)
    for line in lines:
        result = np.concatenate((result, line), axis=0)
        result = np.concatenate((result, space), axis=0)
    
    cv2.imshow('result', result)
    cv2.imshow('image', image)
    cv2.imshow('dilate', dilate)
    cv2.waitKey()