Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/tensorflow/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 基于训练LSTM模型的预测_Python_Tensorflow_Machine Learning_Keras_Lstm - Fatal编程技术网

Python 基于训练LSTM模型的预测

Python 基于训练LSTM模型的预测,python,tensorflow,machine-learning,keras,lstm,Python,Tensorflow,Machine Learning,Keras,Lstm,根据我收集的一些数据,我使用LSTM训练了一个模型。我想把它分为犬科动物和猫科动物 我试图预测一串这样的文本 json_file = open('model.json', 'r') loaded_model_json = json_file.read() json_file.close() loaded_model = model_from_json(loaded_model_json) # load weights into new model loaded_model.load_weigh

根据我收集的一些数据,我使用LSTM训练了一个模型。我想把它分为犬科动物和猫科动物

我试图预测一串这样的文本

json_file = open('model.json', 'r')
loaded_model_json = json_file.read()
json_file.close()
loaded_model = model_from_json(loaded_model_json)

# load weights into new model
loaded_model.load_weights("lstm.hd5")
print("Loaded model from disk")


text_to_predict = ['A 2‐year‐old male domestic shorthair cat was presented for a progressive history of abnormal posture, behavior, and mentation. Menace response was absent bilaterally, and generalized tremors were identified on neurological examination. A neuroanatomical diagnosis of diffuse brain dysfunction was made. A neurodegenerative disorder was suspected. Magnetic resonance imaging findings further supported the clinical suspicion. Whole‐genome sequencing of the affected cat with filtering of variants against a database of unaffected cats was performed. Candidate variants were confirmed by Sanger sequencing followed by genotyping of a control population. Two homozygous private (unique to individual or families and therefore absent from the breed‐matched controlled population) protein‐changing variants in the major facilitator superfamily domain 8 (MFSD8) gene, a known candidate gene for neuronal ceroid lipofuscinosis type 7 (CLN7), were identified. The affected cat was homozygous for the alternative allele at both variants. This is the first report of a pathogenic alteration of the MFSD8 gene in a cat strongly suspected to have CLN7.']




MAX_SEQUENCE_LENGTH = 352
MAX_NB_WORDS = 2000

tokenizer = Tokenizer(num_words=MAX_NB_WORDS, split=' ')
seq = tokenizer.texts_to_sequences(text_to_predict)
padded = pad_sequences(seq, maxlen=MAX_SEQUENCE_LENGTH)
pred = loaded_model.predict(padded)
labels = ['canine', 'feline']
print(pred, labels[np.argmax(pred)])
然而,无论我选择对哪个字符串进行分类,预测结果都是一样的

[0.5212073 0.47879276]]犬科

我也不确定为什么我必须将MAX_SEQUENCE_LENGTH设置为352,因为我的模型似乎期望一个这样大小的数组。将其设置为任何其他值将返回错误

ValueError: Error when checking input: expected embedding_1_input to have shape (352,) but got array with shape (250,)
我的模型训练,作为参考,是通过这段代码完成的

data = pd.read_csv('data.csv')
data['Text'] = data['Text'].apply((lambda x: re.sub('[^a-zA-z0-9\s]','',x)))

MAX_NB_WORDS = 2000
embed_dim = 128
lstm_out = 196

tokenizer = Tokenizer(num_words=MAX_NB_WORDS, split=' ')
tokenizer.fit_on_texts(data['Text'].values)
X = tokenizer.texts_to_sequences(data['Text'].values)
X = pad_sequences(X)


model = Sequential()
model.add(Embedding(max_fatures, embed_dim,input_length = X.shape[1]))
model.add(SpatialDropout1D(0.4))
model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))
model.add(Dense(2,activation='softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
print(model.summary())

# serialize model to JSON
model_json = model.to_json()
with open("model.json", "w") as json_file:
    json_file.write(model_json)

print('model string has been saved')

Y =  data[['canine','feline']]
X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.33, random_state = 42)
print(X_train.shape,Y_train.shape)
print(X_test.shape,Y_test.shape)

batch_size = 32
model.fit(X_train, Y_train, epochs = 30, batch_size=batch_size, verbose = 2)

#save model for future use.
model.save('lstm.hd5')

非常感谢您的帮助:D

从您的问题中,我了解到
模型
培训
后预测正确,但在加载
保存的模型
后,它是
培训
相同的

我最近遇到了同样的问题,这个问题的解决方案是将
标记器
保存在
Pickle文件
中,并在加载
保存的模型
后,当我们想要执行
预测时加载
Pickle文件

用于在Pickle文件中保存标记器的代码:

import pickle

# saving
with open('tokenizer.pickle', 'wb') as handle:
    pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
加载Pickle文件的代码:

with open('tokenizer.pickle', 'rb') as handle:
    tokenizer2 = pickle.load(handle)
除上述代码外,您的代码中还有一些其他观察结果:

  • 在训练模型和对加载的模型执行预测时,建议使用相同的填充
  • 因此,您可以从

    X=pad\u序列(X)

  • 加载模型前后,
    MAX\u SEQUENCE\u LENGTH
    MAX\u NB\u WORDS
    的值应相同

  • 建议在加载模型之前和之后执行相同的数据预处理步骤。因此,您也可以在加载模型后应用函数,
    (lambda x:re.sub(“[^a-zA-z0-9\s]”,“”,x))

  • 下面提到了工作正常的代码:

    data = pd.read_csv('data.csv')
    data['Text'] = data['Text'].apply((lambda x: re.sub('[^a-zA-z0-9\s]','',x)))
    
    MAX_NB_WORDS = 2000
    embed_dim = 128
    lstm_out = 196
    
    tokenizer = Tokenizer(num_words=MAX_NB_WORDS, split=' ')
    tokenizer.fit_on_texts(data['Text'].values)
    
    import pickle  # IMPORTANT STEP
    
    # saving
    with open('tokenizer.pickle', 'wb') as handle:
        pickle.dump(tokenizer, handle, protocol=pickle.HIGHEST_PROTOCOL)
    
    X = tokenizer.texts_to_sequences(data['Text'].values)
    X = pad_sequences(X, maxlen = MAX_SEQUENCE_LENGTH) # Change Number 2
    
    model = Sequential()
    model.add(Embedding(max_fatures, embed_dim,input_length = X.shape[1]))
    model.add(SpatialDropout1D(0.4))
    model.add(LSTM(lstm_out, dropout=0.2, recurrent_dropout=0.2))
    model.add(Dense(2,activation='softmax'))
    model.compile(loss = 'categorical_crossentropy', optimizer='adam',metrics = ['accuracy'])
    print(model.summary())
    
    # serialize model to JSON
    model_json = model.to_json()
    with open("model.json", "w") as json_file:
        json_file.write(model_json)
    
    print('model string has been saved')
    
    Y =  data[['canine','feline']]
    X_train, X_test, Y_train, Y_test = train_test_split(X,Y, test_size = 0.33, random_state = 42)
    print(X_train.shape,Y_train.shape)
    print(X_test.shape,Y_test.shape)
    
    batch_size = 32
    model.fit(X_train, Y_train, epochs = 30, batch_size=batch_size, verbose = 2)
    
    #save model for future use.
    model.save('lstm.hd5')
    
    加载模型的修改代码如下所示:

    json_file = open('model.json', 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    loaded_model = model_from_json(loaded_model_json)
    
    # load weights into new model
    loaded_model.load_weights("lstm.hd5")
    print("Loaded model from disk")
    
    
    text_to_predict = ['A 2‐year‐old male domestic shorthair cat was presented for a progressive history of abnormal posture, behavior, and mentation. Menace response was absent bilaterally, and generalized tremors were identified on neurological examination. A neuroanatomical diagnosis of diffuse brain dysfunction was made. A neurodegenerative disorder was suspected. Magnetic resonance imaging findings further supported the clinical suspicion. Whole‐genome sequencing of the affected cat with filtering of variants against a database of unaffected cats was performed. Candidate variants were confirmed by Sanger sequencing followed by genotyping of a control population. Two homozygous private (unique to individual or families and therefore absent from the breed‐matched controlled population) protein‐changing variants in the major facilitator superfamily domain 8 (MFSD8) gene, a known candidate gene for neuronal ceroid lipofuscinosis type 7 (CLN7), were identified. The affected cat was homozygous for the alternative allele at both variants. This is the first report of a pathogenic alteration of the MFSD8 gene in a cat strongly suspected to have CLN7.']
    
    text_to_predict = text_to_predict.apply((lambda x: re.sub('[^a-zA-z0-9\s]','',x))) # CHANGE 3
    
    MAX_SEQUENCE_LENGTH = 352
    MAX_NB_WORDS = 2000
    
    # Loading the Pickle File ==> IMPORTANT STEP
    with open('tokenizer.pickle', 'rb') as handle:
        tokenizer2 = pickle.load(handle)
    
    # tokenizer = Tokenizer(num_words=MAX_NB_WORDS, split=' ') # THIS IS NOT REQUIRED
    seq = tokenizer2.texts_to_sequences(text_to_predict)
    padded = pad_sequences(seq, maxlen=MAX_SEQUENCE_LENGTH)
    pred = loaded_model.predict(padded)
    labels = ['canine', 'feline']
    print(pred, labels[np.argmax(pred)])
    
    如果这些更改没有给您提供所需的输出,请联系我,我将很乐意帮助您


    希望这有帮助。学习愉快

    你需要标记器,你用来训练你的模型。这样,您将以
    np.array([tokenizer.encode('which string input')])的形式传入数据。
    表示标记器对象没有属性encode。这可能是因为您没有使用tensorflow标记器:
    将tensorflow_数据集作为TFD导入;tokenizer=tfds.features.text.SubwordTextEncoder.build_from_corpus(干净的数据,目标语音)
    我会在PCI上尝试你的方法,但仍然不理解你的响应,我使用的是生成LSTM模型时使用的同一个tokenizer。但是,它不会生成具有相同形状的东西。当我评估模型时,我得到一个长度为352的int32数组。当我试着在一根弦上训练时,它会变成一个字符串长度的数组,而不是填充到352?
    json_file = open('model.json', 'r')
    loaded_model_json = json_file.read()
    json_file.close()
    loaded_model = model_from_json(loaded_model_json)
    
    # load weights into new model
    loaded_model.load_weights("lstm.hd5")
    print("Loaded model from disk")
    
    
    text_to_predict = ['A 2‐year‐old male domestic shorthair cat was presented for a progressive history of abnormal posture, behavior, and mentation. Menace response was absent bilaterally, and generalized tremors were identified on neurological examination. A neuroanatomical diagnosis of diffuse brain dysfunction was made. A neurodegenerative disorder was suspected. Magnetic resonance imaging findings further supported the clinical suspicion. Whole‐genome sequencing of the affected cat with filtering of variants against a database of unaffected cats was performed. Candidate variants were confirmed by Sanger sequencing followed by genotyping of a control population. Two homozygous private (unique to individual or families and therefore absent from the breed‐matched controlled population) protein‐changing variants in the major facilitator superfamily domain 8 (MFSD8) gene, a known candidate gene for neuronal ceroid lipofuscinosis type 7 (CLN7), were identified. The affected cat was homozygous for the alternative allele at both variants. This is the first report of a pathogenic alteration of the MFSD8 gene in a cat strongly suspected to have CLN7.']
    
    text_to_predict = text_to_predict.apply((lambda x: re.sub('[^a-zA-z0-9\s]','',x))) # CHANGE 3
    
    MAX_SEQUENCE_LENGTH = 352
    MAX_NB_WORDS = 2000
    
    # Loading the Pickle File ==> IMPORTANT STEP
    with open('tokenizer.pickle', 'rb') as handle:
        tokenizer2 = pickle.load(handle)
    
    # tokenizer = Tokenizer(num_words=MAX_NB_WORDS, split=' ') # THIS IS NOT REQUIRED
    seq = tokenizer2.texts_to_sequences(text_to_predict)
    padded = pad_sequences(seq, maxlen=MAX_SEQUENCE_LENGTH)
    pred = loaded_model.predict(padded)
    labels = ['canine', 'feline']
    print(pred, labels[np.argmax(pred)])