Python 从二维图像重建三维点_Python_Image Processing_Computer Vision_3d Reconstruction

Python 从二维图像重建三维点

python image-processing computer-vision

Python 从二维图像重建三维点,python,image-processing,computer-vision,3d-reconstruction,Python,Image Processing,Computer Vision,3d Reconstruction,我正在尝试理解从2d立体图像重建3d点的基础知识。到目前为止，我的理解可以总结如下： import matplotlib.pyplot as plt from mpl_toolkits.mplot3d import Axes3D import numpy as np import cv2 from camera import Camera import structure import processor import features def dino(): # Dino

我正在尝试理解从2d立体图像重建3d点的基础知识。到目前为止，我的理解可以总结如下：

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2

from camera import Camera
import structure
import processor
import features

def dino():
    # Dino
    img1 = cv2.imread('imgs/dinos/viff.003.ppm')
    img2 = cv2.imread('imgs/dinos/viff.001.ppm')
    pts1, pts2 = features.find_correspondence_points(img1, img2)
    points1 = processor.cart2hom(pts1)
    points2 = processor.cart2hom(pts2)

    fig, ax = plt.subplots(1, 2)
    ax[0].autoscale_view('tight')
    ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
    ax[0].plot(points1[0], points1[1], 'r.')
    ax[1].autoscale_view('tight')
    ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
    ax[1].plot(points2[0], points2[1], 'r.')
    fig.show()

    height, width, ch = img1.shape
    intrinsic = np.array([  # for dino
        [2360, 0, width / 2],
        [0, 2360, height / 2],
        [0, 0, 1]])

    return points1, points2, intrinsic


points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)

for item in range(len-1):
    print(files[item], files[(item+1)%len])
    #dino() function takes 2 images as input
    #and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
    points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
    #print(('Length', len(points1))
    # Calculate essential matrix with 2d points.
    # Result will be up to a scale
    # First, normalize points
    points1n = np.dot(np.linalg.inv(intrinsic), points1)
    points2n = np.dot(np.linalg.inv(intrinsic), points2)
    E = structure.compute_essential_normalized(points1n, points2n)
    print('Computed essential matrix:', (-E / E[0][1]))

    # Given we are at camera 1, calculate the parameters for camera 2
    # Using the essential matrix returns 4 possible camera paramters
    P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
    P2s = structure.compute_P_from_essential(E)

    ind = -1
    for i, P2 in enumerate(P2s):
        # Find the correct camera parameters
        d1 = structure.reconstruct_one_point(
            points1n[:, 0], points2n[:, 0], P1, P2)

        # Convert P2 from camera view to world view
        P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
        d2 = np.dot(P2_homogenous[:3, :4], d1)

        if d1[2] > 0 and d2[2] > 0:
            ind = i

    P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
    #tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
    tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)

    if not points3d.size:
        points3d = tripoints3d
    else:
        points3d = np.concatenate((points3d, tripoints3d), 1)


fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()

对于三维点（深度贴图）重建，我们需要从两个不同的视图中获得同一对象的两幅图像，给定这样的图像对，我们还需要相机矩阵（比如P1，P2）

我们使用SIFT或SURF等方法在两幅图像中找到对应点
在得到相应的关键点后，我们使用最少8个关键点（用于8点算法）找到本质矩阵（比如K）
假设我们在摄像机1处，使用基本矩阵计算摄像机2的参数，返回4个可能的摄像机参数
最后，我们使用相应的点和两个相机参数进行三维点估计使用三角法

在读完理论部分后，作为我的第一个实验，我尝试运行可用的代码，正如预期的那样。通过对代码进行一些修改，我尝试在所有连续图像对上运行此示例，并合并三维点云以进行对象的三维重建（

dino

），如下所示：

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import numpy as np
import cv2

from camera import Camera
import structure
import processor
import features

def dino():
    # Dino
    img1 = cv2.imread('imgs/dinos/viff.003.ppm')
    img2 = cv2.imread('imgs/dinos/viff.001.ppm')
    pts1, pts2 = features.find_correspondence_points(img1, img2)
    points1 = processor.cart2hom(pts1)
    points2 = processor.cart2hom(pts2)

    fig, ax = plt.subplots(1, 2)
    ax[0].autoscale_view('tight')
    ax[0].imshow(cv2.cvtColor(img1, cv2.COLOR_BGR2RGB))
    ax[0].plot(points1[0], points1[1], 'r.')
    ax[1].autoscale_view('tight')
    ax[1].imshow(cv2.cvtColor(img2, cv2.COLOR_BGR2RGB))
    ax[1].plot(points2[0], points2[1], 'r.')
    fig.show()

    height, width, ch = img1.shape
    intrinsic = np.array([  # for dino
        [2360, 0, width / 2],
        [0, 2360, height / 2],
        [0, 0, 1]])

    return points1, points2, intrinsic


points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)

for item in range(len-1):
    print(files[item], files[(item+1)%len])
    #dino() function takes 2 images as input
    #and outputs the keypoint point matches(corresponding points in two different views) along the camera intrinsic parameters.
    points1, points2, intrinsic = dino(files[item], files[(item+1)%len])
    #print(('Length', len(points1))
    # Calculate essential matrix with 2d points.
    # Result will be up to a scale
    # First, normalize points
    points1n = np.dot(np.linalg.inv(intrinsic), points1)
    points2n = np.dot(np.linalg.inv(intrinsic), points2)
    E = structure.compute_essential_normalized(points1n, points2n)
    print('Computed essential matrix:', (-E / E[0][1]))

    # Given we are at camera 1, calculate the parameters for camera 2
    # Using the essential matrix returns 4 possible camera paramters
    P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0]])
    P2s = structure.compute_P_from_essential(E)

    ind = -1
    for i, P2 in enumerate(P2s):
        # Find the correct camera parameters
        d1 = structure.reconstruct_one_point(
            points1n[:, 0], points2n[:, 0], P1, P2)

        # Convert P2 from camera view to world view
        P2_homogenous = np.linalg.inv(np.vstack([P2, [0, 0, 0, 1]]))
        d2 = np.dot(P2_homogenous[:3, :4], d1)

        if d1[2] > 0 and d2[2] > 0:
            ind = i

    P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))[:3, :4]
    #tripoints3d = structure.reconstruct_points(points1n, points2n, P1, P2)
    tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2)

    if not points3d.size:
        points3d = tripoints3d
    else:
        points3d = np.concatenate((points3d, tripoints3d), 1)


fig = plt.figure()
fig.suptitle('3D reconstructed', fontsize=16)
ax = fig.gca(projection='3d')
ax.plot(points3d[0], points3d[1], points3d[2], 'b.')
ax.set_xlabel('x axis')
ax.set_ylabel('y axis')
ax.set_zlabel('z axis')
ax.view_init(elev=135, azim=90)
plt.show()

但我得到了意想不到的结果。请告诉我上述方法是否正确，或者如何合并多个3d点云以构建单个3d结构。

总体思路如下

在代码的每次迭代中，计算右摄影机相对于左摄影机的相对姿势。然后对二维点进行三角剖分，并将生成的三维点连接到一个大阵列中。但是连接的点不在同一坐标系中

您需要做的是累积估计的相对姿势，以保持绝对姿势估计。然后可以像以前一样对二维点进行三角剖分，但在连接结果点之前，需要将它们映射到第一个摄影机的坐标系

下面是如何做到这一点

首先，在循环之前，初始化累加矩阵

绝对值\u P1

：

points3d = np.empty((0,0))
files = glob.glob("imgs/dinos/*.ppm")
len = len(files)
absolute_P1 = np.array([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 1, 0], [0, 0, 0, 1]])

for item in range(len-1):
    # ...

然后，在特征三角剖分后，将3D点映射到第一个摄影机的坐标系，并更新累积姿势：

# ...
P2 = np.linalg.inv(np.vstack([P2s[ind], [0, 0, 0, 1]]))
tripoints3d = structure.linear_triangulation(points1n, points2n, P1, P2[:3, :4])

abs_tripoints3d = np.matmul(absolute_P1, np.vstack([tripoints3d, np.ones(np.shape(tripoints3d)[1])]))
absolute_P1 = np.matmul(absolute_P1, np.linalg.inv(P2)) # P2 needs to be 4x4 here!

if not points3d.size:
    points3d = abs_tripoints3d
else:
    points3d = np.concatenate((points3d, abs_tripoints3d), 1)

# ...

TL；博士您可能无法仅通过组合所有2个图像重建来获得所需的完整三维重建。我尝试了很多不同的方法，但没有一种有效。基本上，故障似乎都归结为2图像姿态估计算法中的噪声，这往往会产生不合理的结果。通过简单地组合所有2个图像姿势来跟踪绝对姿势的任何尝试都会在整个重建过程中传播噪声

OP正在使用的中的代码基于教科书。第19章引用了一个例子，他们的方法有些复杂。除了2个图像重建，他们还使用3个图像重建，以及（可能最重要的是）最后的一个拟合步骤，这有助于确保没有单个虚假结果破坏重建

代码

…正在进行中

另一种可能的理解途径是从motion或SLAM中查看结构的开源实现。请注意，这些系统可能变得相当复杂。然而，OpenSfM是用Python编写的，我认为它很容易导航和理解。我经常把它作为我自己工作的参考

只是为了给你一点更多的信息开始（如果你选择走这条路）。“运动结构”（Structure from motion）是一种用于获取2D图像集合并从中创建3D模型（点云）的算法，它还解决了每个摄影机相对于该点云的位置问题（即，所有返回的摄影机姿势都在世界帧中，点云也是如此）

OpenSfM在高层的步骤：

阅读图像exif，了解您可以使用的任何先前信息（例如焦点长度）

提取特征点（例如，筛选）

匹配特征点

将这些特征点匹配转换为轨迹（例如，如果在图像1、2和3中看到特征点，则可以将其连接到轨迹中）一条赛道，而不是比赛（1,2），比赛（2,3）等。）

增量重建（注意，还有一种全局方法）。此过程将使用轨迹以增量方式添加重建图像，三角化新点，并细化使用称为束调整的过程设置姿势/点位置

希望这能有所帮助

如果这样进行，则从每对重建的3D点将位于不同的坐标帧中，因此简单地将它们连接起来将不会产生任何有意义的结果。假设您希望通过逐步旋转相机拍摄的一系列照片来构建全景。如果你只是把照片叠在一起，你就看不到全景了。为此，您需要在图像旋转时移动图像。对于点云，这是相同的，您需要将单独的点云彼此一致地对齐。谢谢@aldurdisciple，是的，我几天前了解了您的观点。这就是为什么我将问题更新为如何合并不同视图的多个点云？您的代码不包含

dino

函数的定义，您链接到的代码也不包含。你能添加它吗？@AlexanderReynolds你能添加一个链接到一个描述实际捆绑调整算法/实现的好资源吗？不确定3D场景的最佳资源，但对于2d/全景，Richard Szeliski的是一个很好的资源，它提供了一个很好的高层次概述，并提供了非常好的参考来深入研究。希望有帮助。这个答案不对。它为前两组恐龙图像生成的图像。从那以后情况就更糟了。谢谢@tel的反馈，我一定是误解了OP使用的相机姿势。我更新了第二段代码，它应该