Python 用于目标检测的Pyrotch-图像增强_Python_Image_Computer Vision_Pytorch_Object Detection

Python 用于目标检测的Pyrotch-图像增强

python image computer-vision pytorch

Python 用于目标检测的Pyrotch-图像增强,python,image,computer-vision,pytorch,object-detection,Python,Image,Computer Vision,Pytorch,Object Detection,我正在使用PyTorch进行对象检测和改进现有模型（转移学习），如以下链接所述- 虽然图像增强使用了不同的变换（本教程中为水平翻转），但本教程并未提及变换边界框/注释以确保它们与变换后的图像一致。我缺少一些基本的东西吗？在训练阶段，在加载数据时，变换确实应用于图像和目标。在PennFudanDataset类中，我们有以下两行： if self.transforms is not None: img, target = self.transforms(img, target) 其中

我正在使用PyTorch进行对象检测和改进现有模型（转移学习），如以下链接所述-

虽然图像增强使用了不同的变换（本教程中为水平翻转），但本教程并未提及变换边界框/注释以确保它们与变换后的图像一致。我缺少一些基本的东西吗？

在训练阶段，在加载数据时，变换确实应用于图像和目标。在

PennFudanDataset

类中，我们有以下两行：

if self.transforms is not None:  
    img, target = self.transforms(img, target)

其中，

target

是包含以下内容的词典：

target = {}
target["boxes"] = boxes
target["labels"] = labels
target["masks"] = masks
target["image_id"] = image_id
target["area"] = area
target["iscrowd"] = iscrowd

pennfudataset

类中的

self.transforms（）

设置为一个转换列表，其中包含

[transforms.ToTensor（），transforms.Compose（）]

，在实例化数据集时，来自

get_transform（）

的返回值：

dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))

为对象检测任务编写的转换。具体而言，在中，我们处理图像和目标（例如，遮罩、关键点）：

为了完整起见，我借用了github repo的代码：

def __call__(self, image, target):
        if random.random() < self.prob:
            height, width = image.shape[-2:]
            image = image.flip(-1)
            bbox = target["boxes"]
            bbox[:, [0, 2]] = width - bbox[:, [2, 0]]
            target["boxes"] = bbox
            if "masks" in target:
                target["masks"] = target["masks"].flip(-1)
            if "keypoints" in target:
                keypoints = target["keypoints"]
                keypoints = _flip_coco_person_keypoints(keypoints, width)
                target["keypoints"] = keypoints
        return image, target

def\uuuu调用（自我、图像、目标）：
如果random.random（）


在这里，我们可以理解他们是如何根据图像在遮罩和关键点上执行翻转的