Python 如何将2D和1D类（分别为一个热编码和正则整数）的混合输入Keras序列模型？_Python_Machine Learning_Keras_Regression_Keras Layer

Python 如何将2D和1D类（分别为一个热编码和正则整数）的混合输入Keras序列模型？

python machine-learning keras

Python 如何将2D和1D类（分别为一个热编码和正则整数）的混合输入Keras序列模型？,python,machine-learning,keras,regression,keras-layer,Python,Machine Learning,Keras,Regression,Keras Layer,我有一个熊猫数据框，有849743行和13列，即（849743,13）的形状这些列中的大多数只包含整数，但是，其中3列有一个热编码的分类变量。它们不是使用Keras或sklearn（或任何其他库）的一个热编码/嵌入功能进行编码的，我只是用python手动完成例如，df['d']是一个包含一个热编码变量的列，下面是一个摘录： 1082077 [0, 1, 0, 0, 0, 0, 0] 995216 [1, 0, 0, 0, 0, 0, 0] 924611 [0, 0, 0

我有一个熊猫数据框，有849743行和13列，即（849743,13）的形状

这些列中的大多数只包含整数，但是，其中3列有一个热编码的分类变量。它们不是使用Keras或sklearn（或任何其他库）的一个热编码/嵌入功能进行编码的，我只是用python手动完成

例如，df['d']是一个包含一个热编码变量的列，下面是一个摘录：

1082077    [0, 1, 0, 0, 0, 0, 0]
995216     [1, 0, 0, 0, 0, 0, 0]
924611     [0, 0, 0, 0, 1, 0, 0]
1171772    [0, 0, 0, 1, 0, 0, 0]
96796      [0, 0, 1, 0, 0, 0, 0]

请忽略那些荒谬的事实

这是列中的第一行：

array([1, 0, 0, 0, 0, 0, 0])

可以看出，此数据帧列的元素都是嵌套的numpy数组

为了进一步了解Pandas DataFrame的结构，以下是第一行的所有元素：

随后，我使用以下命令将其转换为numpy数组：

x\u列=测向值

这保留了数据帧的原始尺寸，即（849743,13）

我创建了一个无意义的Keras序列模型，只是为了测试输入是否有效，这就是我第一次发现错误的原因。模型如下：

# create model
model = Sequential()
model.add(Dense(130, input_dim=13, kernel_initializer='normal', 
          activation='relu'))
model.add(Dense(1, kernel_initializer='normal'))
# Compile model
model.compile(loss='mean_squared_error', optimizer='adam')

由于DataFrame/numpy数组中有13列，因此输入_dim被设置为13，但是，我认为问题是由3个one热编码列中的嵌套numpy数组引起的

我将带有13列的原始numpy数组（称为x_列）与y_列（观测变量）一起输入model.fit函数：

model.fit(x_train, y_train,
          epochs=20,
          batch_size=128)

我得到以下错误：

    Bad input argument to theano function with name "train_function" at index 0 (0-based).  
Backtrace when that variable is created:

  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
  File "C:/Users/Studying/Documents/GitHub/IFN665/Machine Learning/keras_regression_practice.py", line 106, in <module>
    model = baseline_model(input_shape)
  File "C:/Users/Studying/Documents/GitHub/IFN665/Machine Learning/keras_regression_practice.py", line 23, in baseline_model
    model.add(Dense(130, input_dim=1, kernel_initializer='normal', activation='relu'))
  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\keras\models.py", line 432, in add
    dtype=layer.dtype, name=layer.name + '_input')
  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\keras\engine\topology.py", line 1426, in Input
    input_tensor=tensor)
  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\keras\legacy\interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\keras\engine\topology.py", line 1337, in __init__
    name=self.name)
  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\keras\backend\theano_backend.py", line 222, in placeholder
    x = T.TensorType(dtype, broadcast)(name)
setting an array element with a sequence.

索引0处名为“train_function”（基于0）的无函数的输入参数错误。
创建该变量时进行回溯：
文件“C:\Users\studing\AppData\Local\conda\conda\envs\Tensorflow gpu\lib\site packages\spyder\utils\site\sitecustomize.py”，第101行，在execfile中
exec（编译（f.read（），文件名，'exec'），命名空间）
文件“C:/Users/studing/Documents/GitHub/IFN665/Machine Learning/keras\u regression\u practice.py”，第106行，在
模型=基线模型（输入形状）
文件“C:/Users/studing/Documents/GitHub/IFN665/Machine Learning/keras_regression_practice.py”，第23行，在基线_模型中
add（密集型（130，input_dim=1，kernel_initializer='normal'，activation='relu'））
文件“C:\Users\studing\AppData\Local\conda\conda\envs\Tensorflow gpu\lib\site packages\keras\models.py”，第432行，添加
dtype=layer.dtype，name=layer.name+“\u输入”）
文件“C:\Users\studing\AppData\Local\conda\conda\envs\Tensorflow gpu\lib\site packages\keras\engine\topology.py”，第1426行，输入
输入（张量=张量）
文件“C:\Users\studing\AppData\Local\conda\conda\envs\Tensorflow gpu\lib\site packages\keras\legacy\interfaces.py”，包装器第87行
返回函数（*args，**kwargs）
文件“C:\Users\studing\AppData\Local\conda\conda\envs\Tensorflow gpu\lib\site packages\keras\engine\topology.py”，第1337行，在uu init中__
name=self.name）
文件“C:\Users\studing\AppData\Local\conda\conda\envs\Tensorflow gpu\lib\site packages\keras\backend\theano\u backend.py”，第222行，位于占位符中
x=T.TensorType（数据类型，广播）（名称）
使用序列设置数组元素。

我尝试过删除所有一个热编码列，并相应地调整input_dim变量，它确实有效（从某种意义上说，它不会导致错误，该模型显然是一个垃圾预测器）

我认为不可能（尽管缺乏搜索）有一个numpy数组，其中某些元素是2D，一些是1D，例如将嵌套的numpy（一个热编码的数组）更改为2D列表，并允许所有其他变量保持1D

我在这个网站上搜索过类似的问题，但是，我找到的关于Keras和一个热编码变量的一切似乎都是在问它是什么，或者如何做，而不是如何混合使用一个热编码和1D整数输入

如何做到这一点？我是不是错过了一些显而易见的东西

问题是因为您的数据不统一，当您将其转换为NumPy数组时，某些条目又是数组，即热编码的条目，这会导致形状/类型不匹配。根据处理数据的方式，您有2个选项：

展平内部阵列，使最终形状为（示例，>13）。通过展平，我的意思是在NumPy数组中有更多列用于一个热编码数据。一行看起来像一个混合物

[0,0,1,0,0，…，2.3492,1.3483，…]

，因此形状是一致的。然后您的

input\u dim=len（数据[0]）

如果您真的需要单独的输入，可能您希望以不同的方式处理它们，例如进入不同的密集层等，您将需要升级到。这将是一个多输入模型，文档对此进行了很好的解释

你说得对，我在发帖30分钟后就想出来了，我已经像你第一次建议的那样把数组展平了。谢谢你的帮助！你应该得到奖励。

    Bad input argument to theano function with name "train_function" at index 0 (0-based).  
Backtrace when that variable is created:

  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\spyder\utils\site\sitecustomize.py", line 101, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)
  File "C:/Users/Studying/Documents/GitHub/IFN665/Machine Learning/keras_regression_practice.py", line 106, in <module>
    model = baseline_model(input_shape)
  File "C:/Users/Studying/Documents/GitHub/IFN665/Machine Learning/keras_regression_practice.py", line 23, in baseline_model
    model.add(Dense(130, input_dim=1, kernel_initializer='normal', activation='relu'))
  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\keras\models.py", line 432, in add
    dtype=layer.dtype, name=layer.name + '_input')
  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\keras\engine\topology.py", line 1426, in Input
    input_tensor=tensor)
  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\keras\legacy\interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\keras\engine\topology.py", line 1337, in __init__
    name=self.name)
  File "C:\Users\Studying\AppData\Local\conda\conda\envs\Tensorflow-gpu\lib\site-packages\keras\backend\theano_backend.py", line 222, in placeholder
    x = T.TensorType(dtype, broadcast)(name)
setting an array element with a sequence.