Python 检查输入时出错:预期嵌入\u 1\u输入为2维,但得到了形状为()的数组

Python 检查输入时出错:预期嵌入\u 1\u输入为2维,但得到了形状为()的数组,python,tensorflow,keras,Python,Tensorflow,Keras,我正在尝试使用RNN构建一个带有keras的tweet生成器。我遇到了这个问题,我不知道它是从哪里来的。我也在网上搜索了几个小时,但什么也没找到。我肯定它很小,但我不能得到它 以下是代码(来自): 数据如下所示: id text 0 1204000574099857409 Democrats launch impeachment endgame with ris

我正在尝试使用RNN构建一个带有keras的tweet生成器。我遇到了这个问题,我不知道它是从哪里来的。我也在网上搜索了几个小时,但什么也没找到。我肯定它很小,但我不能得到它

以下是代码(来自):

数据如下所示:

                    id                                               text
0  1204000574099857409  Democrats launch impeachment endgame with risi...
1  1203998807928823809  ***********************#biden2020 #Election202...
2  1203998376376832000  Any congressional representation doing this sh...
3  1203997840718086144  I"m glad to see this. #Booker deserves to be s...
4  1203997705938362368  @realDonaldTrump #AmericaFirst #KAG2020 #Trump...
结果是:

Using TensorFlow backend.
total characters in our dataset: 4786659
unique characters: 186
<MapDataset shapes: ((100,), (100,)), types: (tf.int32, tf.int32)>
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding_1 (Embedding)      (64, None, 256)           47616     
_________________________________________________________________
gru_1 (GRU)                  (64, None, 256)           393984    
_________________________________________________________________
dropout_1 (Dropout)          (64, None, 256)           0         
_________________________________________________________________
gru_2 (GRU)                  (64, None, 256)           393984    
_________________________________________________________________
dense_1 (Dense)              (64, None, 186)           47802     
=================================================================
Total params: 883,386
Trainable params: 883,386
Non-trainable params: 0
_________________________________________________________________
Traceback (most recent call last):
  File ".../src/tweet_generator_2.py", line 97, in <module>
    history = model.fit(np.array(dataset2), validation_data=dataset, validation_steps=30, epochs=EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback])
  File "...\Anaconda\envs\gputest\lib\site-packages\keras\engine\training.py", line 1154, in fit
    batch_size=batch_size)
  File "...\Anaconda\envs\gputest\lib\site-packages\keras\engine\training.py", line 579, in _standardize_user_data
    exception_prefix='input')
  File "...\Anaconda\envs\gputest\lib\site-packages\keras\engine\training_utils.py", line 135, in standardize_input_data
    'with shape ' + str(data_shape))
ValueError: Error when checking input: expected embedding_1_input to have 2 dimensions, but got array with shape ()

Process finished with exit code 1
使用TensorFlow后端。
数据集中的字符总数:4786659
唯一字符:186
模型:“顺序_1”
_________________________________________________________________
层(类型)输出形状参数
=================================================================
嵌入(64,无,256)47616
_________________________________________________________________
gru_1(gru)(64,无,256)3984
_________________________________________________________________
辍学1(辍学)(64,无,256)0
_________________________________________________________________
gru_2(gru)(64,无,256)3984
_________________________________________________________________
密集型_1(密集型)(64,无,186)47802
=================================================================
总参数:883386
可培训参数:883386
不可训练参数:0
_________________________________________________________________
回溯(最近一次呼叫最后一次):
文件“../src/tweet_generator_2.py”,第97行,在
history=model.fit(np.array(dataset2),validation\u data=dataset,validation\u steps=30,epochs=epochs,steps\u per\u epoch=steps\u per\u epoch,callbacks=[checkpoint\u callback])
文件“..\Anaconda\envs\gputest\lib\site packages\keras\engine\training.py”,第1154行,以fit格式
批次大小=批次大小)
文件“..\Anaconda\envs\gputest\lib\site packages\keras\engine\training.py”,第579行,在用户数据中
异常(前缀为“输入”)
文件“..\Anaconda\envs\gputest\lib\site packages\keras\engine\training\u utils.py”,第135行,标准化输入数据
“带形状”+str(数据形状))
ValueError:检查输入时出错:预期嵌入\u 1\u输入为2维,但得到了形状为()的数组
进程已完成,退出代码为1
有人知道我如何解决这个问题吗?我不明白shape()是从哪里来的


谢谢大家!

我重复了您的错误,问题在于您在安装模型时提供的数据。
在代码中,您正在使用tf.data生成数据,它将转换为
tensorflow.python.data.ops.dataset\u ops.DatasetV1Adapter
类型。但是当.fit转换为
np.array(dataset2)
类型的
numpy.ndarray
时,它不保存任何输入数据。
在执行代码中缺少的
shuffle
时,需要将其分配给
dataset
。如果不将其分配给数据集,则
DatasetV1Adapter
将具有不同的形状

我已经修改了你的代码,并能够运行没有任何问题

data = pd.read_csv('data/election2020.csv', usecols=[0, 4], names=['id', 'text'], encoding="latin-1")

# all tweets into one string
tweet_txt = data['text'][:].str.cat(sep=' ')
print(f'total characters in our dataset: {len(tweet_txt)}')

# get unique chars and make character mapping
chars = list(set(tweet_txt))
chars.sort()
char_to_index = dict((c,i) for i,c in enumerate(chars))
index_to_char = np.array(chars)
print(f"unique characters: {len(chars)}")
maxlen = 100
tweet_int = np.array([char_to_index[char] for char in tweet_txt])

seq_length = 100
examples_per_epoch = len(tweet_txt)//seq_length
char_dataset = tf.data.Dataset.from_tensor_slices(tweet_int)

sequences = char_dataset.batch(seq_length+1, drop_remainder=True)

def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text


dataset = sequences.map(split_input_target)

BATCH_SIZE = 64
steps_per_epoch = examples_per_epoch//BATCH_SIZE
BUFFER_SIZE = 10000

dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
print(dataset)

# Here is a model using the Keras Functional Api.
import functools
rnn = functools.partial(tf.keras.layers.GRU, recurrent_activation='sigmoid')


def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
    model = Sequential()
    model.add(Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]))
    model.add(rnn(rnn_units, return_sequences=True, recurrent_initializer='glorot_uniform', stateful=True))
    model.add(Dropout(rate=0.2, noise_shape=(batch_size, 1, rnn_units)))
    model.add(rnn(rnn_units, return_sequences=True, recurrent_initializer='glorot_uniform', stateful=True))
    model.add(Dense(vocab_size))
    return model


vocab_size = len(chars)
embedding_dim = 256
rnn_units = 256
batch_size = BATCH_SIZE

model = build_model(vocab_size=vocab_size, embedding_dim=embedding_dim, rnn_units=rnn_units, batch_size=batch_size)

model.summary()


def loss(labels, logits):
    return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)


model.compile(optimizer= tf.train.AdamOptimizer(), loss=loss)

checkpoint_dir = "model_gen/checkpoints"
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}.hdf5")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix, save_weights_only=True)

EPOCHS = 5

early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

history = model.fit(dataset.repeat(), validation_data=dataset, validation_steps=30, epochs=EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback])
型号摘要:

Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (64, None, 256)           123648    
_________________________________________________________________
gru (GRU)                    (64, None, 256)           393984    
_________________________________________________________________
dropout (Dropout)            (64, None, 256)           0         
_________________________________________________________________
gru_1 (GRU)                  (64, None, 256)           393984    
_________________________________________________________________
dense (Dense)                (64, None, 483)           124131    
=================================================================
Total params: 1,035,747
Trainable params: 1,035,747
Non-trainable params: 0  
Epoch 1/5
180/710 [======>.......................] - ETA: 19:52 - loss: 3.5081
正在进行的培训:

Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (64, None, 256)           123648    
_________________________________________________________________
gru (GRU)                    (64, None, 256)           393984    
_________________________________________________________________
dropout (Dropout)            (64, None, 256)           0         
_________________________________________________________________
gru_1 (GRU)                  (64, None, 256)           393984    
_________________________________________________________________
dense (Dense)                (64, None, 483)           124131    
=================================================================
Total params: 1,035,747
Trainable params: 1,035,747
Non-trainable params: 0  
Epoch 1/5
180/710 [======>.......................] - ETA: 19:52 - loss: 3.5081

您能否提供您在上述代码中使用的election2020.csv?这似乎不是您提供的github链接上实现的。@Razor-如果您认为我已经回答了您的问题,请接受并投票表决,谢谢。