Python 检查输入时出错:预期嵌入\u 1\u输入为2维,但得到了形状为()的数组
我正在尝试使用RNN构建一个带有keras的tweet生成器。我遇到了这个问题,我不知道它是从哪里来的。我也在网上搜索了几个小时,但什么也没找到。我肯定它很小,但我不能得到它 以下是代码(来自): 数据如下所示:Python 检查输入时出错:预期嵌入\u 1\u输入为2维,但得到了形状为()的数组,python,tensorflow,keras,Python,Tensorflow,Keras,我正在尝试使用RNN构建一个带有keras的tweet生成器。我遇到了这个问题,我不知道它是从哪里来的。我也在网上搜索了几个小时,但什么也没找到。我肯定它很小,但我不能得到它 以下是代码(来自): 数据如下所示: id text 0 1204000574099857409 Democrats launch impeachment endgame with ris
id text
0 1204000574099857409 Democrats launch impeachment endgame with risi...
1 1203998807928823809 ***********************#biden2020 #Election202...
2 1203998376376832000 Any congressional representation doing this sh...
3 1203997840718086144 I"m glad to see this. #Booker deserves to be s...
4 1203997705938362368 @realDonaldTrump #AmericaFirst #KAG2020 #Trump...
结果是:
Using TensorFlow backend.
total characters in our dataset: 4786659
unique characters: 186
<MapDataset shapes: ((100,), (100,)), types: (tf.int32, tf.int32)>
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_1 (Embedding) (64, None, 256) 47616
_________________________________________________________________
gru_1 (GRU) (64, None, 256) 393984
_________________________________________________________________
dropout_1 (Dropout) (64, None, 256) 0
_________________________________________________________________
gru_2 (GRU) (64, None, 256) 393984
_________________________________________________________________
dense_1 (Dense) (64, None, 186) 47802
=================================================================
Total params: 883,386
Trainable params: 883,386
Non-trainable params: 0
_________________________________________________________________
Traceback (most recent call last):
File ".../src/tweet_generator_2.py", line 97, in <module>
history = model.fit(np.array(dataset2), validation_data=dataset, validation_steps=30, epochs=EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback])
File "...\Anaconda\envs\gputest\lib\site-packages\keras\engine\training.py", line 1154, in fit
batch_size=batch_size)
File "...\Anaconda\envs\gputest\lib\site-packages\keras\engine\training.py", line 579, in _standardize_user_data
exception_prefix='input')
File "...\Anaconda\envs\gputest\lib\site-packages\keras\engine\training_utils.py", line 135, in standardize_input_data
'with shape ' + str(data_shape))
ValueError: Error when checking input: expected embedding_1_input to have 2 dimensions, but got array with shape ()
Process finished with exit code 1
使用TensorFlow后端。
数据集中的字符总数:4786659
唯一字符:186
模型:“顺序_1”
_________________________________________________________________
层(类型)输出形状参数
=================================================================
嵌入(64,无,256)47616
_________________________________________________________________
gru_1(gru)(64,无,256)3984
_________________________________________________________________
辍学1(辍学)(64,无,256)0
_________________________________________________________________
gru_2(gru)(64,无,256)3984
_________________________________________________________________
密集型_1(密集型)(64,无,186)47802
=================================================================
总参数:883386
可培训参数:883386
不可训练参数:0
_________________________________________________________________
回溯(最近一次呼叫最后一次):
文件“../src/tweet_generator_2.py”,第97行,在
history=model.fit(np.array(dataset2),validation\u data=dataset,validation\u steps=30,epochs=epochs,steps\u per\u epoch=steps\u per\u epoch,callbacks=[checkpoint\u callback])
文件“..\Anaconda\envs\gputest\lib\site packages\keras\engine\training.py”,第1154行,以fit格式
批次大小=批次大小)
文件“..\Anaconda\envs\gputest\lib\site packages\keras\engine\training.py”,第579行,在用户数据中
异常(前缀为“输入”)
文件“..\Anaconda\envs\gputest\lib\site packages\keras\engine\training\u utils.py”,第135行,标准化输入数据
“带形状”+str(数据形状))
ValueError:检查输入时出错:预期嵌入\u 1\u输入为2维,但得到了形状为()的数组
进程已完成,退出代码为1
有人知道我如何解决这个问题吗?我不明白shape()是从哪里来的
谢谢大家! 我重复了您的错误,问题在于您在安装模型时提供的数据。
在代码中,您正在使用tf.data生成数据,它将转换为
tensorflow.python.data.ops.dataset\u ops.DatasetV1Adapter
类型。但是当.fit转换为np.array(dataset2)
类型的numpy.ndarray
时,它不保存任何输入数据。在执行代码中缺少的
shuffle
时,需要将其分配给dataset
。如果不将其分配给数据集,则DatasetV1Adapter
将具有不同的形状
我已经修改了你的代码,并能够运行没有任何问题
data = pd.read_csv('data/election2020.csv', usecols=[0, 4], names=['id', 'text'], encoding="latin-1")
# all tweets into one string
tweet_txt = data['text'][:].str.cat(sep=' ')
print(f'total characters in our dataset: {len(tweet_txt)}')
# get unique chars and make character mapping
chars = list(set(tweet_txt))
chars.sort()
char_to_index = dict((c,i) for i,c in enumerate(chars))
index_to_char = np.array(chars)
print(f"unique characters: {len(chars)}")
maxlen = 100
tweet_int = np.array([char_to_index[char] for char in tweet_txt])
seq_length = 100
examples_per_epoch = len(tweet_txt)//seq_length
char_dataset = tf.data.Dataset.from_tensor_slices(tweet_int)
sequences = char_dataset.batch(seq_length+1, drop_remainder=True)
def split_input_target(chunk):
input_text = chunk[:-1]
target_text = chunk[1:]
return input_text, target_text
dataset = sequences.map(split_input_target)
BATCH_SIZE = 64
steps_per_epoch = examples_per_epoch//BATCH_SIZE
BUFFER_SIZE = 10000
dataset = dataset.shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
print(dataset)
# Here is a model using the Keras Functional Api.
import functools
rnn = functools.partial(tf.keras.layers.GRU, recurrent_activation='sigmoid')
def build_model(vocab_size, embedding_dim, rnn_units, batch_size):
model = Sequential()
model.add(Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None]))
model.add(rnn(rnn_units, return_sequences=True, recurrent_initializer='glorot_uniform', stateful=True))
model.add(Dropout(rate=0.2, noise_shape=(batch_size, 1, rnn_units)))
model.add(rnn(rnn_units, return_sequences=True, recurrent_initializer='glorot_uniform', stateful=True))
model.add(Dense(vocab_size))
return model
vocab_size = len(chars)
embedding_dim = 256
rnn_units = 256
batch_size = BATCH_SIZE
model = build_model(vocab_size=vocab_size, embedding_dim=embedding_dim, rnn_units=rnn_units, batch_size=batch_size)
model.summary()
def loss(labels, logits):
return tf.keras.losses.sparse_categorical_crossentropy(labels, logits, from_logits=True)
model.compile(optimizer= tf.train.AdamOptimizer(), loss=loss)
checkpoint_dir = "model_gen/checkpoints"
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt_{epoch}.hdf5")
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_prefix, save_weights_only=True)
EPOCHS = 5
early_stopping = tf.keras.callbacks.EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)
history = model.fit(dataset.repeat(), validation_data=dataset, validation_steps=30, epochs=EPOCHS, steps_per_epoch=steps_per_epoch, callbacks=[checkpoint_callback])
型号摘要:
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (64, None, 256) 123648
_________________________________________________________________
gru (GRU) (64, None, 256) 393984
_________________________________________________________________
dropout (Dropout) (64, None, 256) 0
_________________________________________________________________
gru_1 (GRU) (64, None, 256) 393984
_________________________________________________________________
dense (Dense) (64, None, 483) 124131
=================================================================
Total params: 1,035,747
Trainable params: 1,035,747
Non-trainable params: 0
Epoch 1/5
180/710 [======>.......................] - ETA: 19:52 - loss: 3.5081
正在进行的培训:
Layer (type) Output Shape Param #
=================================================================
embedding (Embedding) (64, None, 256) 123648
_________________________________________________________________
gru (GRU) (64, None, 256) 393984
_________________________________________________________________
dropout (Dropout) (64, None, 256) 0
_________________________________________________________________
gru_1 (GRU) (64, None, 256) 393984
_________________________________________________________________
dense (Dense) (64, None, 483) 124131
=================================================================
Total params: 1,035,747
Trainable params: 1,035,747
Non-trainable params: 0
Epoch 1/5
180/710 [======>.......................] - ETA: 19:52 - loss: 3.5081
您能否提供您在上述代码中使用的election2020.csv?这似乎不是您提供的github链接上实现的。@Razor-如果您认为我已经回答了您的问题,请接受并投票表决,谢谢。