Python 用数组列训练ML模型_Python_Pandas_Scikit Learn

Python 用数组列训练ML模型

python pandas scikit-learn

Python 用数组列训练ML模型,python,pandas,scikit-learn,Python,Pandas,Scikit Learn,我正在尝试使用包含序列化值列表的列来训练模型。但是我在数据类型上遇到了错误。在安装模型之前，我需要执行什么样的预处理 TypeError: float() argument must be a string or a number, not 'list' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "main.py"

我正在尝试使用包含序列化值列表的列来训练模型。但是我在数据类型上遇到了错误。在安装模型之前，我需要执行什么样的预处理

TypeError: float() argument must be a string or a number, not 'list'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "main.py", line 192, in <module>
    regression = train_audio_model()
  File "main.py", line 184, in train_audio_model
    regression.fit(X_train, Y_train)
  File "/Users/colton/code/audio-analysis/env/lib/python3.6/site-packages/sklearn/linear_model/_logistic.py", line 1527, in fit
    accept_large_sparse=solver != 'liblinear')
  File "/Users/colton/code/audio-analysis/env/lib/python3.6/site-packages/sklearn/utils/validation.py", line 755, in check_X_y
    estimator=estimator)
  File "/Users/colton/code/audio-analysis/env/lib/python3.6/site-packages/sklearn/utils/validation.py", line 531, in check_array
    array = np.asarray(array, order=order, dtype=dtype)
  File "/Users/colton/code/audio-analysis/env/lib/python3.6/site-packages/numpy/core/_asarray.py", line 85, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

model.py

数据标题2输出

您需要将列表拆分为单独的列。下面是一个简单的例子来解释这个想法：

# sample df
df = pd.DataFrame({'col':[[1,2,3],[4,5,6]], 'target': [0,1]})

print(df)

         col  target
0  [1, 2, 3]       0
1  [4, 5, 6]       1

# convert column with list into separate column
df = pd.concat([df.pop('col').apply(pd.Series), df['target']], axis=1)

print(df)

   0  1  2  target
0  1  2  3       0
1  4  5  6       1

要训练模型，现在可以执行以下操作：

X_train, X_test, Y_train, Y_test = train_test_split(df.drop('target', axis=1), df['target'])

您需要将每个值作为单独的列传递。您可以显示数据的输出吗。head2@YOLO添加输出。spectrogram列的长度可以根据音频的持续时间而变化。还是把它分成不同的列？

                 filename                                                                                spectrogram  beep
0  ./samples/nonbeep1.wav  [-315.49462890625, 138.87547302246094, -52.60832977294922, 29.540002822875977, -2.4793...     0
1  ./samples/nonbeep2.wav  [-368.6966552734375, 167.4494171142578, -23.79843521118164, 46.0974006652832, -1.74239...     0

# sample df
df = pd.DataFrame({'col':[[1,2,3],[4,5,6]], 'target': [0,1]})

print(df)

         col  target
0  [1, 2, 3]       0
1  [4, 5, 6]       1

# convert column with list into separate column
df = pd.concat([df.pop('col').apply(pd.Series), df['target']], axis=1)

print(df)

   0  1  2  target
0  1  2  3       0
1  4  5  6       1

X_train, X_test, Y_train, Y_test = train_test_split(df.drop('target', axis=1), df['target'])