Python 难以确定2D特征矩阵结构以输入机器学习算法
我正在训练一个情绪识别系统,它通过面部运动来检测情绪。结果,我形成了一个4维矩阵,我试图将其简化为2维 构成4D矩阵的功能:Python 难以确定2D特征矩阵结构以输入机器学习算法,python,matrix,machine-learning,computer-vision,scikit-learn,Python,Matrix,Machine Learning,Computer Vision,Scikit Learn,我正在训练一个情绪识别系统,它通过面部运动来检测情绪。结果,我形成了一个4维矩阵,我试图将其简化为2维 构成4D矩阵的功能: 视频数量(每个视频将被分配情感标签) 每个视频的帧数 每帧面部标志的方向 每帧人脸标记的速度 我尝试训练的重要功能: 左侧是速度(每帧相同面部标志之间的斜边) 右侧是方向(每帧相同面部地标的x和y值的arctan) 我一直使用的4D矩阵,并试图将其简化为2D >> main.shape (60, 17, 68, 2) # 60 videos, 17 f
视频数量(每个视频将被分配情感标签)
每个视频的帧数
每帧面部标志的方向
每帧人脸标记的速度 我尝试训练的重要功能:
左侧是速度(每帧相同面部标志之间的斜边)
右侧是方向(每帧相同面部地标的x和y值的arctan) 我一直使用的4D矩阵,并试图将其简化为2D
>> main.shape
(60, 17, 68, 2)
# 60 videos, 17 frames per video, 68 facial landmarks, 2 features (direction and speed)
>> main
array([[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 1. , 1. ],
[ 1.41421356, 0.78539816],
[ 1.41421356, 0.78539816],
...,
[ 3. , 1. ],
[ 3. , 1. ],
[ 3. , 1. ]],
[[ 0. , 0. ],
[ -1.41421356, 0.78539816],
[ -1.41421356, 0.78539816],
...,
[ 2. , 1. ],
[ 3. , 1. ],
[ 3. , 1. ]],
...,
[[ 1. , 1. ],
[ 1.41421356, -0.78539816],
[ 1.41421356, -0.78539816],
...,
[ -1.41421356, 0.78539816],
[ 1. , 1. ],
[ -1.41421356, 0.78539816]],
[[ 2.23606798, -0.46364761],
[ 2.82842712, -0.78539816],
[ 2.23606798, -0.46364761],
...,
[ 1. , 0. ],
[ 0. , 0. ],
[ 1. , 1. ]],
[[ -1.41421356, -0.78539816],
[ -2.23606798, -0.46364761],
[ -2.23606798, -0.46364761],
...,
[ 1.41421356, -0.78539816],
[ 1.41421356, -0.78539816],
[ 2.23606798, -1.10714872]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 2. , 1. ],
[ 2.23606798, -1.10714872],
[ 1.41421356, -0.78539816],
...,
[ -2. , -0. ],
[ -1. , -0. ],
[ -1.41421356, -0.78539816]],
[[ 2. , 1. ],
[ -2.23606798, 1.10714872],
[ -1.41421356, 0.78539816],
...,
[ 1. , 1. ],
[ -1. , -0. ],
[ -1. , -0. ]],
...,
[[ -2. , -0. ],
[ -3. , -0. ],
[ -4.12310563, -0.24497866],
...,
[ 0. , 0. ],
[ -1. , -0. ],
[ -2.23606798, 1.10714872]],
[[ -2.23606798, 1.10714872],
[ -1.41421356, 0.78539816],
[ -2.23606798, 1.10714872],
...,
[ -2.23606798, 0.46364761],
[ -1.41421356, 0.78539816],
[ -1.41421356, 0.78539816]],
[[ 2. , 1. ],
[ 1.41421356, 0.78539816],
[ 2.82842712, 0.78539816],
...,
[ 1. , 1. ],
[ 1. , 1. ],
[ -2.23606798, -1.10714872]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 1. , 1. ],
[ 0. , 0. ],
[ 1. , 1. ],
...,
[ -3. , -0. ],
[ -2. , -0. ],
[ 0. , 0. ]],
[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 1.41421356, 0.78539816],
[ 1. , 0. ],
[ 0. , 0. ]],
...,
[[ 1. , 0. ],
[ 1. , 1. ],
[ 0. , 0. ],
...,
[ 2. , 1. ],
[ 3. , 1. ],
[ 3. , 1. ]],
[[ -7.28010989, 1.29249667],
[ -7.28010989, 1.29249667],
[ -8.54400375, 1.21202566],
...,
[-22.02271555, 1.52537305],
[ 22.09072203, -1.48013644],
[ 22.36067977, -1.39094283]],
[[ 1. , 0. ],
[ 1.41421356, -0.78539816],
[ 1. , 0. ],
...,
[ -1.41421356, -0.78539816],
[ 1. , 1. ],
[ 1.41421356, 0.78539816]]],
...,
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 5.38516481, 0.38050638],
[ 5.09901951, 0.19739556],
[ 4.47213595, -0.46364761],
...,
[ -1.41421356, 0.78539816],
[ -2.82842712, 0.78539816],
[ -5. , 0.64350111]],
[[ -6.32455532, 0.32175055],
[ -6.08276253, -0.16514868],
[ -5.65685425, -0.78539816],
...,
[ 3.60555128, 0.98279372],
[ 5. , 0.92729522],
[ 5.65685425, 0.78539816]],
...,
[[ -3.16227766, -0.32175055],
[ -3.60555128, -0.98279372],
[ 5. , 1. ],
...,
[ 12.08304597, 1.14416883],
[ 13.15294644, 1.418147 ],
[ 14.31782106, 1.35970299]],
[[ 3.60555128, -0.5880026 ],
[ 4.47213595, -1.10714872],
[ 6. , 1. ],
...,
[-20.39607805, 1.37340077],
[-21.02379604, 1.52321322],
[-22.09072203, 1.48013644]],
[[ 1. , 1. ],
[ -1.41421356, 0.78539816],
[ 1. , 1. ],
...,
[ 4.12310563, 1.32581766],
[ 4. , 1. ],
[ 4.12310563, 1.32581766]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 0. , 0. ],
[ 1. , 1. ],
[ -2.23606798, 1.10714872],
...,
[ -3.16227766, 0.32175055],
[ 1. , 1. ],
[ 1.41421356, -0.78539816]],
[[ 1. , 1. ],
[ 1. , 1. ],
[ 1. , 1. ],
...,
[ 3. , 1. ],
[ 2. , 1. ],
[ -1.41421356, 0.78539816]],
...,
[[ 5.38516481, -1.19028995],
[ 4.47213595, -1.10714872],
[ 4.12310563, -1.32581766],
...,
[ 2.23606798, -0.46364761],
[ 1. , 1. ],
[ -1. , -0. ]],
[[ -5.38516481, 1.19028995],
[ -4.12310563, 1.32581766],
[ -3.16227766, 1.24904577],
...,
[ 0. , 0. ],
[ 1. , 0. ],
[ 1.41421356, -0.78539816]],
[[ 8.06225775, 1.44644133],
[ -7.07106781, -1.42889927],
[ 6. , 1. ],
...,
[ -3.16227766, -0.32175055],
[ -3.16227766, -0.32175055],
[ -3.16227766, -0.32175055]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ -2.23606798, 0.46364761],
[ -1.41421356, 0.78539816],
[ -2.23606798, 0.46364761],
...,
[ 1. , 0. ],
[ 1. , 0. ],
[ 1. , 1. ]],
[[ -2.23606798, -0.46364761],
[ -1.41421356, -0.78539816],
[ 2. , 1. ],
...,
[ 0. , 0. ],
[ 1. , 0. ],
[ 1. , 0. ]],
...,
[[ 1. , 0. ],
[ 1. , 1. ],
[ -2.23606798, -1.10714872],
...,
[ 19.02629759, 1.51821327],
[ 19. , 1. ],
[-19.10497317, -1.46591939]],
[[ 3.60555128, 0.98279372],
[ 3.60555128, 0.5880026 ],
[ 5. , 0.64350111],
...,
[ 7.28010989, -1.29249667],
[ 7.61577311, -1.16590454],
[ 8.06225775, -1.05165021]],
[[ -7.28010989, 1.29249667],
[ -5. , 0.92729522],
[ -5.83095189, 0.5404195 ],
...,
[ 20.09975124, 1.47112767],
[ 21.02379604, 1.52321322],
[-20.22374842, -1.42190638]]]])
>> main
array([[ 0. , 0. , 0. , ..., -0.78539816,
2.23606798, -1.10714872],
[ 0. , 0. , 0. , ..., 1. ,
-2.23606798, -1.10714872],
[ 0. , 0. , 0. , ..., 1. ,
1.41421356, 0.78539816],
...,
[ 0. , 0. , 0. , ..., 1. ,
4.12310563, 1.32581766],
[ 0. , 0. , 0. , ..., -0.32175055,
-3.16227766, -0.32175055],
[ 0. , 0. , 0. , ..., 1.52321322,
-20.22374842, -1.42190638]])
>> main.shape
(60, 2312)
方向和速度特征是非常有价值的(最重要的特征),因为它代表了每帧每个面部地标的运动,我正试图让机器学习算法在此基础上进行训练
我试图将三个维度重新塑造成一个长向量(只是将速度、方向和帧混合在一起),最后形成一个2D矩阵,我将其输入sklearn SVM函数,它产生的精度相当低。我预料到了这一点,因为我认为ml算法不可能识别出巨大的单个矩阵中的特征之间的差异,并假设向量中的所有内容都是相同的特征
我被迫制作2D矩阵,通过将速度、方向和每帧视频都强制加入一个向量,将其输入sklearn SVM,并获得低精度:
>> main.shape
(60, 17, 68, 2)
# 60 videos, 17 frames per video, 68 facial landmarks, 2 features (direction and speed)
>> main
array([[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 1. , 1. ],
[ 1.41421356, 0.78539816],
[ 1.41421356, 0.78539816],
...,
[ 3. , 1. ],
[ 3. , 1. ],
[ 3. , 1. ]],
[[ 0. , 0. ],
[ -1.41421356, 0.78539816],
[ -1.41421356, 0.78539816],
...,
[ 2. , 1. ],
[ 3. , 1. ],
[ 3. , 1. ]],
...,
[[ 1. , 1. ],
[ 1.41421356, -0.78539816],
[ 1.41421356, -0.78539816],
...,
[ -1.41421356, 0.78539816],
[ 1. , 1. ],
[ -1.41421356, 0.78539816]],
[[ 2.23606798, -0.46364761],
[ 2.82842712, -0.78539816],
[ 2.23606798, -0.46364761],
...,
[ 1. , 0. ],
[ 0. , 0. ],
[ 1. , 1. ]],
[[ -1.41421356, -0.78539816],
[ -2.23606798, -0.46364761],
[ -2.23606798, -0.46364761],
...,
[ 1.41421356, -0.78539816],
[ 1.41421356, -0.78539816],
[ 2.23606798, -1.10714872]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 2. , 1. ],
[ 2.23606798, -1.10714872],
[ 1.41421356, -0.78539816],
...,
[ -2. , -0. ],
[ -1. , -0. ],
[ -1.41421356, -0.78539816]],
[[ 2. , 1. ],
[ -2.23606798, 1.10714872],
[ -1.41421356, 0.78539816],
...,
[ 1. , 1. ],
[ -1. , -0. ],
[ -1. , -0. ]],
...,
[[ -2. , -0. ],
[ -3. , -0. ],
[ -4.12310563, -0.24497866],
...,
[ 0. , 0. ],
[ -1. , -0. ],
[ -2.23606798, 1.10714872]],
[[ -2.23606798, 1.10714872],
[ -1.41421356, 0.78539816],
[ -2.23606798, 1.10714872],
...,
[ -2.23606798, 0.46364761],
[ -1.41421356, 0.78539816],
[ -1.41421356, 0.78539816]],
[[ 2. , 1. ],
[ 1.41421356, 0.78539816],
[ 2.82842712, 0.78539816],
...,
[ 1. , 1. ],
[ 1. , 1. ],
[ -2.23606798, -1.10714872]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 1. , 1. ],
[ 0. , 0. ],
[ 1. , 1. ],
...,
[ -3. , -0. ],
[ -2. , -0. ],
[ 0. , 0. ]],
[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 1.41421356, 0.78539816],
[ 1. , 0. ],
[ 0. , 0. ]],
...,
[[ 1. , 0. ],
[ 1. , 1. ],
[ 0. , 0. ],
...,
[ 2. , 1. ],
[ 3. , 1. ],
[ 3. , 1. ]],
[[ -7.28010989, 1.29249667],
[ -7.28010989, 1.29249667],
[ -8.54400375, 1.21202566],
...,
[-22.02271555, 1.52537305],
[ 22.09072203, -1.48013644],
[ 22.36067977, -1.39094283]],
[[ 1. , 0. ],
[ 1.41421356, -0.78539816],
[ 1. , 0. ],
...,
[ -1.41421356, -0.78539816],
[ 1. , 1. ],
[ 1.41421356, 0.78539816]]],
...,
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 5.38516481, 0.38050638],
[ 5.09901951, 0.19739556],
[ 4.47213595, -0.46364761],
...,
[ -1.41421356, 0.78539816],
[ -2.82842712, 0.78539816],
[ -5. , 0.64350111]],
[[ -6.32455532, 0.32175055],
[ -6.08276253, -0.16514868],
[ -5.65685425, -0.78539816],
...,
[ 3.60555128, 0.98279372],
[ 5. , 0.92729522],
[ 5.65685425, 0.78539816]],
...,
[[ -3.16227766, -0.32175055],
[ -3.60555128, -0.98279372],
[ 5. , 1. ],
...,
[ 12.08304597, 1.14416883],
[ 13.15294644, 1.418147 ],
[ 14.31782106, 1.35970299]],
[[ 3.60555128, -0.5880026 ],
[ 4.47213595, -1.10714872],
[ 6. , 1. ],
...,
[-20.39607805, 1.37340077],
[-21.02379604, 1.52321322],
[-22.09072203, 1.48013644]],
[[ 1. , 1. ],
[ -1.41421356, 0.78539816],
[ 1. , 1. ],
...,
[ 4.12310563, 1.32581766],
[ 4. , 1. ],
[ 4.12310563, 1.32581766]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ 0. , 0. ],
[ 1. , 1. ],
[ -2.23606798, 1.10714872],
...,
[ -3.16227766, 0.32175055],
[ 1. , 1. ],
[ 1.41421356, -0.78539816]],
[[ 1. , 1. ],
[ 1. , 1. ],
[ 1. , 1. ],
...,
[ 3. , 1. ],
[ 2. , 1. ],
[ -1.41421356, 0.78539816]],
...,
[[ 5.38516481, -1.19028995],
[ 4.47213595, -1.10714872],
[ 4.12310563, -1.32581766],
...,
[ 2.23606798, -0.46364761],
[ 1. , 1. ],
[ -1. , -0. ]],
[[ -5.38516481, 1.19028995],
[ -4.12310563, 1.32581766],
[ -3.16227766, 1.24904577],
...,
[ 0. , 0. ],
[ 1. , 0. ],
[ 1.41421356, -0.78539816]],
[[ 8.06225775, 1.44644133],
[ -7.07106781, -1.42889927],
[ 6. , 1. ],
...,
[ -3.16227766, -0.32175055],
[ -3.16227766, -0.32175055],
[ -3.16227766, -0.32175055]]],
[[[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ],
...,
[ 0. , 0. ],
[ 0. , 0. ],
[ 0. , 0. ]],
[[ -2.23606798, 0.46364761],
[ -1.41421356, 0.78539816],
[ -2.23606798, 0.46364761],
...,
[ 1. , 0. ],
[ 1. , 0. ],
[ 1. , 1. ]],
[[ -2.23606798, -0.46364761],
[ -1.41421356, -0.78539816],
[ 2. , 1. ],
...,
[ 0. , 0. ],
[ 1. , 0. ],
[ 1. , 0. ]],
...,
[[ 1. , 0. ],
[ 1. , 1. ],
[ -2.23606798, -1.10714872],
...,
[ 19.02629759, 1.51821327],
[ 19. , 1. ],
[-19.10497317, -1.46591939]],
[[ 3.60555128, 0.98279372],
[ 3.60555128, 0.5880026 ],
[ 5. , 0.64350111],
...,
[ 7.28010989, -1.29249667],
[ 7.61577311, -1.16590454],
[ 8.06225775, -1.05165021]],
[[ -7.28010989, 1.29249667],
[ -5. , 0.92729522],
[ -5.83095189, 0.5404195 ],
...,
[ 20.09975124, 1.47112767],
[ 21.02379604, 1.52321322],
[-20.22374842, -1.42190638]]]])
>> main
array([[ 0. , 0. , 0. , ..., -0.78539816,
2.23606798, -1.10714872],
[ 0. , 0. , 0. , ..., 1. ,
-2.23606798, -1.10714872],
[ 0. , 0. , 0. , ..., 1. ,
1.41421356, 0.78539816],
...,
[ 0. , 0. , 0. , ..., 1. ,
4.12310563, 1.32581766],
[ 0. , 0. , 0. , ..., -0.32175055,
-3.16227766, -0.32175055],
[ 0. , 0. , 0. , ..., 1.52321322,
-20.22374842, -1.42190638]])
>> main.shape
(60, 2312)
我想保留速度和方向特征,但必须在2D矩阵中表示它们,该矩阵考虑了视频中的帧
情感标签将贴在每个视频的17帧中的每一帧上。(因此,基本上,17帧视频将被标记为情感)
有什么聪明的方法可以重塑和缩小4D矩阵来实现这一点吗?因此,你提出问题的方式绝对会让你看到精确度很低,你几乎无法改变它。将单一情感分配给视频(取决于您的语料库)通常不够准确,以至于任何机器学习算法都无法学习您试图提取的信号 此外,您将该问题定义为一个时间序列问题,这将使您的生活变得令人头痛,特别是如果您使用的是现成的
sklearn
算法,这些算法非常不适合此类任务
如果可能的话,您应该将您的问题定义为计算机视觉问题。你应该尝试预测每一帧的情绪内容。如果您没有具有这种粒度级别的数据集,您就不会看到很高的准确性
这有点偏离了你提问的方式,但你提问的方式是不易处理的。以下是解决问题的方法:
- 用情感内容标记单个框架
- 训练一个基于图像的算法来分类那些被标记的帧
- 卷积神经网络可能会为任何基于图像的问题提供最佳性能,因为您有一个适当大小的数据集
- 如果这不是一个选项,您需要开发图像的一维特征表示。我个人建议使用图像功能API。一旦你有了这个表示,一个典型的算法,比如SVM,将会非常有效
- 如果准确度不太符合您的要求,但越来越接近,我建议使用预处理/数据增强管道,如详细信息所示,该示例用于浮游生物识别,基本方法相同
- 如果准确度仍然达不到标准,并且您需要对整个视频进行预测,那么您将希望汇总结果,以在整个视频中给出准确的结果
- 一种方法是在视频预测图上训练卷积神经网络。这有点奇怪,但可能效果很好
- 一个好的方法是使用贝叶斯方法,假设每个预测都有一定程度的可信度,并结合视频上的预测分布
- 最好的方法是将其视为集成学习问题。幸运的是,集成学习是一个被很好地研究和理解的问题。您可以找到如何以这种格式组合多个预测的详细信息
免责声明:我是Indio的首席执行官,因此在推荐其使用时可能存在偏见。您使用的是scikit learn and
numpy
。这不是一个MATLAB问题。您正在使用Python。因此,我重新标记了你的帖子以反映这一点。如果您的问题不涉及MATLAB环境,请避免使用MATLAB标签。谢谢您的回答!我将重新考虑我解决这个问题的方法,并尝试你的方法。对于功能,您是否仍然认为我在为每个帧添加情感标签以指示帧的上一个移动时仍保留速度和方向功能,或者您是否建议我删除这些功能并创建一组全新的功能?因此,添加速度和方向功能可能有助于提高准确性,但如果我不得不猜测,这种改进是微不足道的,我可能会采用神经网络方法。我想利用这一事实,我可以使用面部地标,并希望创建一个基于面部运动的情感识别系统。您会推荐什么样的功能来优化我的ac