Python Pad_序列为max_len(Keras)获取多个参数
我试图在文本分类的遗传算法中使用Keras模型,但是我在pad_序列中遇到了一个错误,它声称:Python Pad_序列为max_len(Keras)获取多个参数,python,keras,Python,Keras,我试图在文本分类的遗传算法中使用Keras模型,但是我在pad_序列中遇到了一个错误,它声称: TypeError: pad_sequences() got multiple values for argument 'maxlen' 实际pad_序列变量赋值为: data = self.pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH) 可在以下文件中找到: def get_data(self): """Retrieve the
TypeError: pad_sequences() got multiple values for argument 'maxlen'
实际pad_序列变量赋值为:
data = self.pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
可在以下文件中找到:
def get_data(self):
"""Retrieve the dataset and process the data."""
batch_size = 128
VALIDATION_SPLIT = 0.2
MAX_SEQUENCE_LENGTH = 1000
MAX_NUM_WORDS = 20000
csv = 'VocabCSV.csv'
my_df = self.pd.read_csv(csv,index_col=0,encoding = 'latin-1')
my_df.dropna(inplace=True)
my_df.reset_index(drop=True,inplace=True)
print(my_df.info())
texts = my_df.Text # list of text samples
labellist = my_df.Target # list of labels
label_vals = [] # label values list
labels_index = {} # dictionary mapping label name to numeric id
labels = [] # list of label ids
for label in labellist:
if label not in label_vals:
label_vals.append(label)
for idx, text in enumerate(texts):
for label in label_vals:
if label == labellist[idx]:
label_id = label_vals.index(label)
labels_index[text] = label_id
labels.append(label_id)
print("labels index {}".format(len(labels_index)))
print("labels size: %s " % len(labels))
print("found %s texts." % len(texts))
# finally, vectorize the text samples into a 2D integer tensor
tokenizer = self.Tokenizer(num_words=MAX_NUM_WORDS)
tokenizer.fit_on_texts(texts)
sequences = tokenizer.texts_to_sequences(texts)
word_index = tokenizer.word_index
print('Found %s unique tokens.' % len(word_index))
data = self.pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
print(self.np.asarray(labels).shape)
labels = self.to_categorical(labels)
print('Shape of data tensor:', data.shape)
print('Shape of label tensor:', labels.shape)
# split the data into a training set and a validation set
indices = self.np.arange(data.shape[0])
self.np.random.shuffle(indices)
data = data[indices]
labels = labels[indices]
num_validation_samples = int(VALIDATION_SPLIT * data.shape[0])
x_train = data[:-num_validation_samples]
y_train = labels[:-num_validation_samples]
x_test = data[-num_validation_samples:]
y_test = labels[-num_validation_samples:]
print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)
print(len(x_test))
print(len(y_test))
input_shape = MAX_SEQUENCE_LENGTH
print(input_shape)
nb_classes = len(label_vals)
return (nb_classes, batch_size, input_shape, x_train, x_test, y_train, y_test, word_index)
当另一个函数调用get_数据时,错误似乎就会发生,但我无法确定实际原因。问题是您有
self.pad_序列(序列,maxlen=MAX_序列长度)
。pad\u sequences
方法不属于您的类,而是来自keras.preprocessing.sequence
因此,如果希望它正常工作,请按如下方式进行导入:
from keras.preprocessing import sequence
sequences = sequence.pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)
然后像这样调用pad\u序列
:
from keras.preprocessing import sequence
sequences = sequence.pad_sequences(sequences, maxlen=MAX_SEQUENCE_LENGTH)