Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何将嵌套数组馈送到SVM模型中_Python_Machine Learning_Scikit Learn_Supervised Learning - Fatal编程技术网

Python 如何将嵌套数组馈送到SVM模型中

Python 如何将嵌套数组馈送到SVM模型中,python,machine-learning,scikit-learn,supervised-learning,Python,Machine Learning,Scikit Learn,Supervised Learning,我的问题如下: 我有一个数组,它的特征向量对应于几个音频文件。例如,如果有10个音频文件,那么这个数组的长度应该是10 我有一个特性本身就是一个列表(该列表包含音频文件特定特性的信息),对于给定的音频文件,该特性向量如下所示: array([0.03861840871664194, 187.72393405210002, 62.59881268743305, 0.2911392405063291, array([4963.40332031, 3229.98046875

我的问题如下:

我有一个数组,它的特征向量对应于几个音频文件。例如,如果有10个音频文件,那么这个数组的长度应该是10

我有一个特性本身就是一个列表(该列表包含音频文件特定特性的信息),对于给定的音频文件,该特性向量如下所示:

array([0.03861840871664194, 187.72393405210002, 62.59881268743305,
       0.2911392405063291,
       array([4963.40332031, 3229.98046875, 2691.65039062, 3208.44726562,
       4338.94042969, 4220.5078125 , 4166.67480469, 4801.90429688,
       5555.56640625, 5910.86425781, 6115.4296875 , 5706.29882812,
       4984.93652344, 2756.25      , 1991.82128906, 2551.68457031,
       2734.71679688, 2906.98242188, 3143.84765625, 3219.21386719,
       3186.9140625 , 3165.38085938, 3068.48144531, 2465.55175781,
       2110.25390625, 2508.61816406, 2993.11523438, 3843.67675781,
       4715.77148438, 5652.46582031, 5480.20019531, 5792.43164062,
       5932.39746094, 6244.62890625, 6072.36328125, 6201.5625    ,
       6158.49609375, 6201.5625    , 6233.86230469, 6061.59667969])],
      dtype=object)
现在,当我尝试将这些数据输入svm模型时:

from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt


X_train, X_val, y_train, y_val = train_test_split(X,y,test_size=0.3)

model  = svm.SVC()
model.fit(X_train,y_train)

yt_p = model.predict(X_train)
yv_p = model.predict(X_val)
我得到了这个错误
ValueError:用序列设置数组元素。

我如何构造我的特征向量,以便能够将其提供给svm

编辑:

这里我提供了一个X的例子

如果我们有5个音频文件,那么X将是:

array([[0.017455393927437918, 227.66237105624407, 32.42076654734572,
        0.3867924528301887,
        array([1851.85546875, 2433.25195312, 3057.71484375, 3079.24804688,
       3079.24804688, 3068.48144531, 3046.94824219, 3359.1796875 ,
       3908.27636719, 4618.87207031, 4618.87207031, 4521.97265625,
       4091.30859375, 3111.54785156, 3100.78125   , 2863.91601562,
       1561.15722656, 1119.7265625 , 1065.89355469,  947.4609375 ,
        979.76074219,  990.52734375,  990.52734375, 1356.59179688,
       2077.95410156, 2993.11523438, 3025.41503906, 3068.48144531,
       3079.24804688, 3090.01464844, 3100.78125   , 3111.54785156,
       2993.11523438, 3100.78125   , 3079.24804688, 2853.14941406,
       1205.859375  , 1281.22558594, 1614.99023438, 2131.78710938,
       2325.5859375 , 2034.88769531, 1916.45507812, 1744.18945312,
       1851.85546875, 2357.88574219, 2368.65234375, 1916.45507812,
       1959.52148438, 1959.52148438, 1754.95605469, 1787.25585938,
       2207.15332031])],
       [0.03861840871664194, 187.72393405210002, 62.59881268743305,
        0.2911392405063291,
        array([4963.40332031, 3229.98046875, 2691.65039062, 3208.44726562,
       4338.94042969, 4220.5078125 , 4166.67480469, 4801.90429688,
       5555.56640625, 5910.86425781, 6115.4296875 , 5706.29882812,
       4984.93652344, 2756.25      , 1991.82128906, 2551.68457031,
       2734.71679688, 2906.98242188, 3143.84765625, 3219.21386719,
       3186.9140625 , 3165.38085938, 3068.48144531, 2465.55175781,
       2110.25390625, 2508.61816406, 2993.11523438, 3843.67675781,
       4715.77148438, 5652.46582031, 5480.20019531, 5792.43164062,
       5932.39746094, 6244.62890625, 6072.36328125, 6201.5625    ,
       6158.49609375, 6201.5625    , 6233.86230469, 6061.59667969])],
       [0.042435441297643324, 128.81225073038124, 20.912528554426807,
        0.313953488372093,
        array([4349.70703125, 4242.04101562, 4274.34082031, 4123.60839844,
       4457.37304688, 4834.20410156, 4661.93847656, 4306.640625  ,
       4231.27441406, 4543.50585938, 4435.83984375, 6201.5625    ,
       8817.84667969, 8817.84667969,  742.89550781,  721.36230469,
        732.12890625,  732.12890625,  710.59570312,  721.36230469,
        925.92773438, 1119.7265625 , 1141.25976562, 1431.95800781,
       7762.71972656, 7934.98535156, 7891.91894531, 7332.05566406,
       3789.84375   , 2799.31640625, 2831.61621094, 2217.91992188,
        581.39648438,  602.9296875 , 2217.91992188, 2228.68652344,
       2368.65234375, 2519.38476562, 2863.91601562, 3682.17773438,
       3649.87792969, 4188.20800781, 4112.84179688])],
       [0.006295381642571726, 130.28309914454434, 5.193614287487564,
        0.2411764705882353,
        array([7978.05175781, 8010.3515625 , 8118.01757812, 8430.24902344,
       8257.98339844, 8451.78222656, 8591.74804688, 8677.88085938,
       8796.31347656, 8850.14648438, 8796.31347656, 8925.51269531,
       6244.62890625,  344.53125   ,  344.53125   , 1614.99023438,
       2325.5859375 , 2971.58203125, 3316.11328125, 3617.578125  ,
       3294.58007812, 2788.54980469, 2637.81738281, 2702.41699219,
       2723.95019531, 3133.08105469, 3413.01269531, 5663.23242188,
       5770.8984375 , 5577.09960938, 2228.68652344, 1604.22363281,
       1690.35644531, 4123.60839844, 5566.33300781, 5803.19824219,
       5749.36523438, 5846.26464844, 6772.19238281, 7073.65722656,
       7622.75390625, 7859.61914062, 8236.45019531, 8441.015625  ,
       8699.4140625 , 8807.08007812, 8742.48046875, 8667.11425781,
       8710.18066406, 8947.04589844, 9140.84472656, 9130.078125  ,
       8936.27929688, 8925.51269531, 8947.04589844, 8925.51269531,
       9097.77832031, 9205.44433594, 9194.67773438, 9140.84472656,
       9162.37792969, 9043.9453125 , 9162.37792969, 9108.54492188,
       9183.91113281, 9280.81054688, 9270.04394531, 9108.54492188,
       9076.24511719, 9356.17675781, 9226.97753906, 9216.2109375 ,
       9248.51074219, 9140.84472656, 9237.74414062, 9334.64355469,
       9259.27734375, 9226.97753906, 9216.2109375 , 9108.54492188,
       9183.91113281, 9216.2109375 , 9248.51074219, 9259.27734375,
       9183.91113281])],
       [0.017070271599460656, 171.91660927761163, 26.854424936811768,
        0.11188811188811189,
        array([4715.77148438, 4629.63867188, 4898.80371094, 5275.63476562,
       4941.87011719, 4532.73925781, 4618.87207031, 4995.703125  ,
       4705.00488281, 4500.43945312, 4188.20800781, 4371.24023438,
       4457.37304688, 4188.20800781, 4909.5703125 , 4877.27050781,
       6761.42578125, 7708.88671875, 7719.65332031, 7956.51855469,
       8484.08203125, 9033.17871094, 9043.9453125 , 9000.87890625,
       9011.64550781, 9011.64550781, 9000.87890625, 9108.54492188,
       8817.84667969, 6686.05957031, 1808.7890625 , 1830.32226562,
       1851.85546875, 1636.5234375 , 1022.82714844, 1281.22558594,
       1927.22167969, 1948.75488281, 1302.75878906, 1399.65820312,
       1873.38867188, 1959.52148438, 7245.92285156, 9011.64550781,
       9420.77636719, 9549.97558594, 9453.07617188, 9431.54296875,
       9410.00976562, 9248.51074219, 9151.61132812, 9194.67773438,
       8968.57910156, 8634.81445312, 8268.75      , 7439.72167969,
       5501.73339844, 5232.56835938, 5103.36914062, 7052.12402344,
       7299.75585938, 7127.49023438, 7192.08984375, 5673.99902344,
       5523.26660156, 5986.23046875, 6729.12597656, 6309.22851562,
       5135.66894531, 5081.8359375 , 5329.46777344, 5404.83398438])]],
      dtype=object)

您可以通过两种方式将包含列表的功能提供给模型:

  • 将列表视为附加功能
  • 使用您认为合适的函数(最小值、中值、平均值、最大值、总和等),将其所有元素映射为单个数字
  • 要尝试第一个选项:

    #将'X'转换为数据帧
    X=pd.DataFrame(X)
    #重命名列
    X.columns=['feature_uu'+str(i+1)表示范围内的i(X.shape[1])]
    #将包含列表的要素转换为长格式
    x=x['feature_5'].分解().到_帧()
    #通过观察创建计数器,以便我们可以旋转
    x['observation_id']=x.groupby(level=0).cumcount()
    #转换为数据集并重命名所有列
    x=x.pivot(columns='observation\u id',values='feature\u 5')。fillna(0)
    x=x.添加前缀(“列表元素”)
    #从X中删除'feature_5'
    X.drop(列='feature_5',轴=1,在位=True)
    #将X和X连接在一起
    X=pd.concat([X,X],轴=1)
    #照旧
    X_列,X_val,y_列,y_val=列测试分割(X,y,测试大小=0.3)
    model=svm.SVC()
    模型拟合(X\U系列、y\U系列)
    
    第二个选项没有正确的答案,只有你才能决定怎么做,因为只有你知道列表的意思。但是,如果您希望获得每个列表的平均值(例如),并将其用作特征:

    #获取每个列表的平均值
    平均值=[X[:,4]中数组的np.平均值(数组)]
    #将名单改为`手段'`
    X[:,4]=平均值
    

    然后继续进行拆分和装配。

    我明白了!谢谢你的回答。我根据您的要求添加了一个X示例。在你提到的代码修改之后,我会将答案标记为正确!令人惊叹的。我看到每个列表都有不同的长度。其中一个长度为85。我认为这将是太多的功能,但仍然可以是一个可行的选择,这取决于您的观察数量和处理能力。我会整天编辑代码。谢谢!事实上,它们可能太多了。我会努力减少它们。非常感谢。嗯,我用两个选项更新了答案。现在让我看看进展如何。