Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/363.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 稀疏矩阵长度不明确_Python_Keras_Scikit Learn_Sklearn Pandas - Fatal编程技术网

Python 稀疏矩阵长度不明确

Python 稀疏矩阵长度不明确,python,keras,scikit-learn,sklearn-pandas,Python,Keras,Scikit Learn,Sklearn Pandas,我对机器学习非常陌生,所以这个问题听起来可能很愚蠢。 我正在遵循一个错误,但我面临一个错误,我不知道如何解决 这是我的代码(基本上就是教程中的代码) 当我到达最后一行时,我得到一个错误 “类型错误:稀疏矩阵长度不明确;请使用getnnz()或形状[0]” 我想我必须对正在使用的数据执行某种转换,或者我应该尝试以不同的方式加载这些数据。我已经试着在Stackoverflow上搜索了,但是由于对这一切都不熟悉,我找不到任何有用的东西 我该怎么做?理想情况下,我不仅希望得到解决方案,还希望得到一个关于

我对机器学习非常陌生,所以这个问题听起来可能很愚蠢。 我正在遵循一个错误,但我面临一个错误,我不知道如何解决

这是我的代码(基本上就是教程中的代码)

当我到达最后一行时,我得到一个错误

“类型错误:稀疏矩阵长度不明确;请使用getnnz()或形状[0]”

我想我必须对正在使用的数据执行某种转换,或者我应该尝试以不同的方式加载这些数据。我已经试着在Stackoverflow上搜索了,但是由于对这一切都不熟悉,我找不到任何有用的东西

我该怎么做?理想情况下,我不仅希望得到解决方案,还希望得到一个关于错误发生原因的简要解释,以及解决方案是如何解决的


谢谢

您面临此困难的原因是您的
X\u列车
X\u测试
属于
类型,而您的模型希望它是一个numpy阵列

试着将它们浇铸到稠密状态,你就可以开始了:

X_train = X_train.todense()
X_test = X_test.todense()

不确定,为什么此脚本出现错误

以下脚本运行良好;即使是稀疏矩阵。可以在你的机器上试一试

sentences = ['i want to test this','let us try this',
             'would this work','how about this',
             'even this','this should not work']
y= [0,0,0,0,0,1]
from sklearn.model_selection import train_test_split
sentences_train, sentences_test, y_train, y_test = train_test_split(sentences, y, test_size=0.25, random_state=1000)


from sklearn.feature_extraction.text import CountVectorizer


vectorizer = CountVectorizer()
vectorizer.fit(sentences_train)

X_train = vectorizer.transform(sentences_train)
X_test  = vectorizer.transform(sentences_test)

from keras.models import Sequential
from keras import layers

input_dim = X_train.shape[1] 

model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', 
            optimizer='adam', 
            metrics=['accuracy'])
model.summary()

model.fit(X_train, y_train,
                        epochs=2,
                        verbose=True,
                        validation_data=(X_test, y_test),
                        batch_size=2)

#
Layer (type)                 Output Shape              Param #   
=================================================================
dense_5 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 11        
=================================================================
Total params: 121
Trainable params: 121
Non-trainable params: 0
_________________________________________________________________
Train on 4 samples, validate on 2 samples
Epoch 1/2
4/4 [==============================] - 1s 169ms/step - loss: 0.7570 - acc: 0.2500 - val_loss: 0.6358 - val_acc: 1.0000
Epoch 2/2
4/4 [==============================] - 0s 3ms/step - loss: 0.7509 - acc: 0.2500 - val_loss: 0.6328 - val_acc: 1.0000

哪一行给出了错误?类型(X\U列车)、类型(y\U列车)的输出是什么?@SergeyBushmanov类型(X\U列车):;键入(y\u-train):你能试着像
X\u-train.todense()
一样将稀疏矩阵转换为稠密矩阵,并将结果传递给
model.fit()
?@FrancoPiccolo最后一个历史=model.fit(X\u-train,y\u-train,nb\u-epoch=100,verbose=False,validation\u-data=(X\u-test,y\u-test),batch\u-size=10),我也这么认为。Todense是一项耗资巨大的业务。更新包可能是更好的解决方案。
sentences = ['i want to test this','let us try this',
             'would this work','how about this',
             'even this','this should not work']
y= [0,0,0,0,0,1]
from sklearn.model_selection import train_test_split
sentences_train, sentences_test, y_train, y_test = train_test_split(sentences, y, test_size=0.25, random_state=1000)


from sklearn.feature_extraction.text import CountVectorizer


vectorizer = CountVectorizer()
vectorizer.fit(sentences_train)

X_train = vectorizer.transform(sentences_train)
X_test  = vectorizer.transform(sentences_test)

from keras.models import Sequential
from keras import layers

input_dim = X_train.shape[1] 

model = Sequential()
model.add(layers.Dense(10, input_dim=input_dim, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))

model.compile(loss='binary_crossentropy', 
            optimizer='adam', 
            metrics=['accuracy'])
model.summary()

model.fit(X_train, y_train,
                        epochs=2,
                        verbose=True,
                        validation_data=(X_test, y_test),
                        batch_size=2)

#
Layer (type)                 Output Shape              Param #   
=================================================================
dense_5 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_6 (Dense)              (None, 1)                 11        
=================================================================
Total params: 121
Trainable params: 121
Non-trainable params: 0
_________________________________________________________________
Train on 4 samples, validate on 2 samples
Epoch 1/2
4/4 [==============================] - 1s 169ms/step - loss: 0.7570 - acc: 0.2500 - val_loss: 0.6358 - val_acc: 1.0000
Epoch 2/2
4/4 [==============================] - 0s 3ms/step - loss: 0.7509 - acc: 0.2500 - val_loss: 0.6328 - val_acc: 1.0000