Python 检查输入时出错：预期嵌入\u 1\u输入为2维，但得到了形状为（）的数组_Python_Tensorflow_Keras

Python 检查输入时出错：预期嵌入\u 1\u输入为2维，但得到了形状为（）的数组

python tensorflow keras

Python 检查输入时出错：预期嵌入\u 1\u输入为2维，但得到了形状为（）的数组,python,tensorflow,keras,Python,Tensorflow,Keras,我正在尝试使用RNN构建一个带有keras的tweet生成器。我遇到了这个问题，我不知道它是从哪里来的。我也在网上搜索了几个小时，但什么也没找到。我肯定它很小，但我不能得到它以下是代码（来自）：数据如下所示： id text 0 1204000574099857409 Democrats launch impeachment endgame with ris

我正在尝试使用RNN构建一个带有keras的tweet生成器。我遇到了这个问题，我不知道它是从哪里来的。我也在网上搜索了几个小时，但什么也没找到。我肯定它很小，但我不能得到它

以下是代码（来自）：

数据如下所示：

                    id                                               text
0  1204000574099857409  Democrats launch impeachment endgame with risi...
1  1203998807928823809  ***********************#biden2020 #Election202...
2  1203998376376832000  Any congressional representation doing this sh...
3  1203997840718086144  I"m glad to see this. #Booker deserves to be s...
4  1203997705938362368  @realDonaldTrump #AmericaFirst #KAG2020 #Trump...

结果是：

Using TensorFlow backend.
total characters in our dataset: 4786659
unique characters: 186
<MapDataset shapes: ((100,), (100,)), types: (tf.int32, tf.int32)>
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (64, None, 256)           47616     
_________________________________________________________________
gru_1 (GRU)                  (64, None, 256)           393984    
_________________________________________________________________
dropout_1 (Dropout)          (64, None, 256)           0         
_________________________________________________________________
gru_2 (GRU)                  (64, None, 256)           393984    
_________________________________________________________________
dense_1 (Dense)              (64, None, 186)           47802     
=================================================================
Total params: 883,386
Trainable params: 883,386
Non-trainable params: 0
_________________________________________________________________
Traceback (most recent call last):
  File ".../src/tweet_generator_2.py", line 97, in <module>
    history = model.fit(np.array(dataset2), validation_data=dataset, validation_steps=30, epochs=EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback])
  File "...\Anaconda\envs\gputest\lib\site-packages\keras\engine\training.py", line 1154, in fit
    batch_size=batch_size)
  File "...\Anaconda\envs\gputest\lib\site-packages\keras\engine\training.py", line 579, in _standardize_user_data
    exception_prefix='input')
  File "...\Anaconda\envs\gputest\lib\site-packages\keras\engine\training_utils.py", line 135, in standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking input: expected embedding_1_input to have 2 dimensions, but got array with shape ()

Process finished with exit code 1

使用TensorFlow后端。
数据集中的字符总数：4786659
唯一字符：186
模型：“顺序_1”
_________________________________________________________________
层（类型）输出形状参数
=================================================================
嵌入（64，无，256）47616
_________________________________________________________________
gru_1（gru）（64，无，256）3984
_________________________________________________________________
辍学1（辍学）（64，无，256）0
_________________________________________________________________
gru_2（gru）（64，无，256）3984
_________________________________________________________________
密集型_1（密集型）（64，无，186）47802
=================================================================
总参数：883386
可培训参数：883386
不可训练参数：0
_________________________________________________________________
回溯（最近一次呼叫最后一次）：
文件“../src/tweet_generator_2.py”，第97行，在
history=model.fit（np.array（dataset2），validation\u data=dataset，validation\u steps=30，epochs=epochs，steps\u per\u epoch=steps\u per\u epoch，callbacks=[checkpoint\u callback]）
文件“..\Anaconda\envs\gputest\lib\site packages\keras\engine\training.py”，第1154行，以fit格式
批次大小=批次大小）
文件“..\Anaconda\envs\gputest\lib\site packages\keras\engine\training.py”，第579行，在用户数据中
异常（前缀为“输入”）
文件“..\Anaconda\envs\gputest\lib\site packages\keras\engine\training\u utils.py”，第135行，标准化输入数据
“带形状”+str（数据形状））
ValueError:检查输入时出错：预期嵌入\u 1\u输入为2维，但得到了形状为（）的数组
进程已完成，退出代码为1

有人知道我如何解决这个问题吗？我不明白shape（）是从哪里来的

谢谢大家!

我重复了您的错误，问题在于您在安装模型时提供的数据。
在代码中，您正在使用tf.data生成数据，它将转换为

tensorflow.python.data.ops.dataset\u ops.DatasetV1Adapter

类型。但是当.fit转换为

np.array（dataset2）

类型的

numpy.ndarray

时，它不保存任何输入数据。
在执行代码中缺少的

shuffle

时，需要将其分配给

dataset

。如果不将其分配给数据集，则

DatasetV1Adapter

将具有不同的形状

我已经修改了你的代码，并能够运行没有任何问题

data = pd.read_csv('data/election2020.csv', usecols=[0, 4], names=['id', 'text'], encoding="latin-1")

# all tweets into one string
tweet_txt = data['text'][:].str.cat(sep=' ')
print(f'total characters in our dataset: {len(tweet_txt)}')

# get unique chars and make character mapping
chars = list(set(tweet_txt))
chars.sort()
char_to_index = dict((c,i) for i,c in enumerate(chars))
index_to_char = np.array(chars)
print(f"unique characters: {len(chars)}")
maxlen = 100
tweet_int = np.array([char_to_index[char] for char in tweet_txt])

seq_length = 100
examples_per_epoch = len(tweet_txt)//seq_length
char_dataset = tf.data.Dataset.from_tensor_slices(tweet_int)

sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text


dataset = sequences.map(split_input_target)

BATCH_SIZE = 64
steps_per_epoch = examples_per_epoch//BATCH_SIZE
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
print(dataset)

# Here is a model using the Keras Functional Api.
import functools
rnn = functools.partial(tf.keras.layers.GRU, recurrent_activation='sigmoid')


def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = Sequential()
    model.add(Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]))
    model.add(rnn(rnn_units, return_sequences=True, recurrent_initializer='glorot_uniform', stateful=True))
    model.add(Dropout(rate=0.2, noise_shape=(batch_size, 1, rnn_units)))
    model.add(rnn(rnn_units, return_sequences=True, recurrent_initializer='glorot_uniform', stateful=True))
    model.add(Dense(vocab_size))
    return model


vocab_size = len(chars)
embedding_dim = 256
rnn_units = 256
batch_size = BATCH_SIZE

model = build_model(vocab_size=vocab_size, embedding_dim=embedding_dim, rnn_units=rnn_units, batch_size=batch_size)

model.summary()


def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)


model.compile(optimizer= tf.train.AdamOptimizer(), loss=loss)

checkpoint_dir = "model_gen/checkpoints"
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}.hdf5")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix, save_weights_only=True)

EPOCHS = 5

early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

history = model.fit(dataset.repeat(), validation_data=dataset, validation_steps=30, epochs=EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback])

型号摘要：

Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (64, None, 256)           123648    
_________________________________________________________________
gru (GRU)                    (64, None, 256)           393984    
_________________________________________________________________
dropout (Dropout)            (64, None, 256)           0         
_________________________________________________________________
gru_1 (GRU)                  (64, None, 256)           393984    
_________________________________________________________________
dense (Dense)                (64, None, 483)           124131    
=================================================================
Total params: 1,035,747
Trainable params: 1,035,747
Non-trainable params: 0

Epoch 1/5
180/710 [======>.......................] - ETA: 19:52 - loss: 3.5081

正在进行的培训：

Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (64, None, 256)           123648    
_________________________________________________________________
gru (GRU)                    (64, None, 256)           393984    
_________________________________________________________________
dropout (Dropout)            (64, None, 256)           0         
_________________________________________________________________
gru_1 (GRU)                  (64, None, 256)           393984    
_________________________________________________________________
dense (Dense)                (64, None, 483)           124131    
=================================================================
Total params: 1,035,747
Trainable params: 1,035,747
Non-trainable params: 0

Epoch 1/5
180/710 [======>.......................] - ETA: 19:52 - loss: 3.5081

您能否提供您在上述代码中使用的election2020.csv？这似乎不是您提供的github链接上实现的。@Razor-如果您认为我已经回答了您的问题，请接受并投票表决，谢谢。