Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/291.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 分类任务中的scipy矩阵到numpy数组_Python_Numpy_Scikit Learn_Scipy_Logistic Regression - Fatal编程技术网

Python 分类任务中的scipy矩阵到numpy数组

Python 分类任务中的scipy矩阵到numpy数组,python,numpy,scikit-learn,scipy,logistic-regression,Python,Numpy,Scikit Learn,Scipy,Logistic Regression,我有X_列车数据(类“pandas.core.series.series”)和内容 print(X_train) 0 WASHINGTON — Congressional Republicans have... 1 After the bullet shells get counted, the blood... 2 When Walt Disney’s “Bambi” opened in 1942, cri... 3 Death may

我有X_列车数据(类“pandas.core.series.series”)和内容

print(X_train)

0       WASHINGTON  —   Congressional Republicans have...
1       After the bullet shells get counted, the blood...
2       When Walt Disney’s “Bambi” opened in 1942, cri...
3       Death may be the great equalizer, but it isn’t...
4       SEOUL, South Korea  —   North Korea’s leader, ...
然后我想准备数据进行分类:

count_vect = CountVectorizer()
X_train_counts = count_vect.fit_transform(X_train)
tfidf_transformer = TfidfTransformer()
X_train_tfidf = tfidf_transformer.fit_transform(X_train_counts)
现在X_train_tfidf和X_train_计数是(类'scipy.sparse.csr.csr_matrix')

但是在我的逻辑回归函数中,我可以使用numpy数组。我该怎么做才能修好它

class LogisticRegression2:
    def __init__(self, lr=0.01, num_iter=100000, fit_intercept=True, theta=0, verbose=False):
        self.lr = lr
        self.num_iter = num_iter
        self.fit_intercept = fit_intercept
        self.theta = theta
        self.verbose = verbose

    def __add_intercept(self, X):
        intercept = np.ones((X.shape[0], 1))
        return np.concatenate((intercept, X), axis=1)

    def __sigmoid(self, z):
        return 1 / (1 + np.exp(-z))
        #return .5 * (1 + np.tanh(.5 * z))

    def __loss(self, h, y):
        return (-y * np.log(h) - (1 - y) * np.log(1 - h)).mean()

    def fit(self, X, y):
        if self.fit_intercept:
            X = self.__add_intercept(X)

        # weights initialization
        self.theta = np.zeros(X.shape[1])

        for i in range(self.num_iter):
            z = np.dot(X, self.theta)
            h = self.__sigmoid(z)
            gradient = np.dot(X.T, (h - y)) / y.size
            self.theta -= self.lr * gradient

            if(self.verbose == True and i % 10000 == 0):
                z = np.dot(X, self.theta)
                h = self.__sigmoid(z)
                print('loss: ', self.__loss(h, y))

    def predict_prob(self, X):
        if self.fit_intercept:
            X = self.__add_intercept(X)

        return self.__sigmoid(np.dot(X, self.theta))

    def predict(self, X, threshold=0.5):
        return self.predict_prob(X) >= threshold
如果我使用

X_train_dense = X_train_tfidf.toarray()

model = LogisticRegression2(lr=0.1, num_iter=100)
model.fit(X_train_dense, y_train)
preds = model.predict(X_train_dense)
我有TypeError:-:“float”和“str”的操作数类型不受支持 在

如果我尝试

def __add_intercept(self, X):
    intercept = np.ones((X.shape[0], 1))
    return hstack((intercept, X))

我有内存错误

X\u train\u density=X\u train\u tfidf.toarray()
toarray
是创建密集阵列的稀疏方法
concatenate
不执行
np.array(X\u train\u tfdf)
这是错误的。
scipy.sparse
package documents
csr
矩阵及其方法。我尝试
X\u train\u densite=X\u train\u tfidf.toarray()模型=逻辑回归2(lr=0.1,num\u iter=100)模型。拟合(X\u train\u densite,y\u train)preds=model。预测(X\u train\u densite)
,我有记忆错误,当试图从稀疏数组生成密集数组时,内存错误很常见。这就是代码首先生成稀疏矩阵的原因。
logisticsregression
是否接受稀疏矩阵?还是只使用密集阵列?我使用密集阵列,它的工作原理很好。但是我也需要对文本进行分类,并且我有稀疏矩阵。您可以使用
sparse.hstack
将截取数组添加到稀疏矩阵中。结果将是一个稀疏矩阵。
X\u train\u densed=X\u train\u tfidf.toarray()
toarray
是创建密集阵列的稀疏方法
concatenate
不执行
np.array(X\u train\u tfdf)
这是错误的。
scipy.sparse
package documents
csr
矩阵及其方法。我尝试
X\u train\u densite=X\u train\u tfidf.toarray()模型=逻辑回归2(lr=0.1,num\u iter=100)模型。拟合(X\u train\u densite,y\u train)preds=model。预测(X\u train\u densite)
,我有记忆错误,当试图从稀疏数组生成密集数组时,内存错误很常见。这就是代码首先生成稀疏矩阵的原因。
logisticsregression
是否接受稀疏矩阵?还是只使用密集阵列?我使用密集阵列,它的工作原理很好。但是我也需要对文本进行分类,并且我有稀疏矩阵。您可以使用
sparse.hstack
将截取数组添加到稀疏矩阵中。结果将是一个稀疏矩阵。
def __add_intercept(self, X):
    intercept = np.ones((X.shape[0], 1))
    return hstack((intercept, X))