Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/351.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-遇到x_测试y_测试拟合错误_Python_Arrays_Pandas_Scikit Learn - Fatal编程技术网

Python-遇到x_测试y_测试拟合错误

Python-遇到x_测试y_测试拟合错误,python,arrays,pandas,scikit-learn,Python,Arrays,Pandas,Scikit Learn,我已经建立了一个神经网络,它在一个大约300000行的小数据集上运行良好,有2个分类变量和1个自变量,但是当我将它增加到650万行时,遇到了内存错误。所以我决定修改代码,并且越来越近,但是现在我遇到了一个有适合错误的问题。我有2个分类变量,1和0(可疑或不可疑)的因变量有一列。要启动数据集,如下所示: DBF2 ParentProcess ChildProcess Suspicious 0 C:\Program Files

我已经建立了一个神经网络,它在一个大约300000行的小数据集上运行良好,有2个分类变量和1个自变量,但是当我将它增加到650万行时,遇到了内存错误。所以我决定修改代码,并且越来越近,但是现在我遇到了一个有适合错误的问题。我有2个分类变量,1和0(可疑或不可疑)的因变量有一列。要启动数据集,如下所示:

DBF2
   ParentProcess                   ChildProcess               Suspicious
0  C:\Program Files (x86)\Wireless AutoSwitch\wrl...    ...            0
1  C:\Program Files (x86)\Wireless AutoSwitch\wrl...    ...            0
2  C:\Windows\System32\svchost.exe                      ...            1
3  C:\Program Files (x86)\Wireless AutoSwitch\wrl...    ...            0
4  C:\Program Files (x86)\Wireless AutoSwitch\wrl...    ...            0
5  C:\Program Files (x86)\Wireless AutoSwitch\wrl...    ...            0
我的代码随后出现了错误:

import pandas as pd
import numpy as np
import hashlib
import matplotlib.pyplot as plt
import timeit

X = DBF2.iloc[:, 0:2].values
y = DBF2.iloc[:, 2].values#.ravel()

from sklearn.preprocessing import LabelEncoder, OneHotEncoder
labelencoder_X_1 = LabelEncoder()
X[:, 0] = labelencoder_X_1.fit_transform(X[:, 0])
labelencoder_X_2 = LabelEncoder()
X[:, 1] = labelencoder_X_2.fit_transform(X[:, 1])

onehotencoder = OneHotEncoder(categorical_features = [0,1])
X = onehotencoder.fit_transform(X)

index_to_drop = [0, 2039]
to_keep = list(set(xrange(X.shape[1]))-set(index_to_drop))
X = X[:,to_keep]

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)

#ERROR
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/sklearn/base.py", line 517, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/data.py", line 590, in fit
    return self.partial_fit(X, y)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/data.py", line 621, in partial_fit
    "Cannot center sparse matrices: pass `with_mean=False` "
ValueError: Cannot center sparse matrices: pass `with_mean=False` instead. See docstring for motivation and alternatives.

X_test = sc.transform(X_test)

#ERROR
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/sklearn/preprocessing/data.py", line 677, in transform
    check_is_fitted(self, 'scale_')
  File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 768, in check_is_fitted
    raise NotFittedError(msg % {'name': type(estimator).__name__})
sklearn.exceptions.NotFittedError: This StandardScaler instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.
将熊猫作为pd导入
将numpy作为np导入
导入hashlib
将matplotlib.pyplot作为plt导入
导入时间信息
X=DBF2.iloc[:,0:2]。值
y=DBF2.iloc[:,2]。值#.ravel()
从sklearn.preprocessing导入LabelEncoder,OneHotEncoder
labelencoder_X_1=labelencoder()
X[:,0]=labelencoder_X_1.拟合变换(X[:,0])
labelencoder_X_2=labelencoder()
X[:,1]=labelencoder_X_2.拟合变换(X[:,1])
onehotencoder=onehotencoder(分类功能=[0,1])
X=onehotencoder.fit_变换(X)
索引到下拉=[002039]
to_keep=list(set(xrange(X.shape[1]))-set(index_to_drop))
X=X[:,保留]
从sklearn.model\u选择导入列车\u测试\u拆分
X_序列,X_测试,y_序列,y_测试=序列测试分割(X,y,测试大小=0.2,随机状态=0)
从sklearn.preprocessing导入StandardScaler
sc=StandardScaler()
X_序列=sc.fit_变换(X_序列)
#错误
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“/usr/local/lib/python2.7/dist packages/sklearn/base.py”,第517行,在fit_转换中
返回self.fit(X,**fit_参数).transform(X)
文件“/usr/local/lib/python2.7/dist packages/sklearn/preprocessing/data.py”,第590行
返回自我部分拟合(X,y)
文件“/usr/local/lib/python2.7/dist packages/sklearn/preprocessing/data.py”,第621行,部分匹配
“无法居中稀疏矩阵:pass`with\u means=False`”
ValueError:无法将稀疏矩阵居中:改为传递`with_mean=False`。有关动机和备选方案,请参阅docstring。
X_测试=sc.transform(X_测试)
#错误
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
文件“/usr/local/lib/python2.7/dist packages/sklearn/preprocessing/data.py”,第677行,在transform中
检查是否安装了(自身“刻度”)
文件“/usr/local/lib/python2.7/dist packages/sklearn/utils/validation.py”,第768行,检查是否安装
raise NOTFITTEDEError(msg%{'name':类型(估计器)。\uuuuu name\uuuu})
sklearn.exceptions.NotFitteError:此StandardScaler实例尚未安装。在使用此方法之前,请使用适当的参数调用“fit”。
如果这有助于我打印X_列和y_列:

X_train
<5621203x7043 sparse matrix of type '<type 'numpy.float64'>'
with 11242334 stored elements in Compressed Sparse Row format>

y_train
array([0, 0, 0, ..., 0, 0, 0])
X\u列车
你的火车
数组([0,0,0,…,0,0,0])

X\u train
是一个稀疏矩阵,当您使用大型数据集(如您的案例)时,它非常适合。问题在于,正如所解释的:

使用_mean:boolean,默认为True

如果为True,则在缩放之前将数据居中。这不起作用(并且会 在稀疏矩阵上尝试时引发异常),因为 将它们居中需要构建一个常用的密集矩阵 案例可能太大,无法放入内存

您可以尝试使用_mean=False传递

sc = StandardScaler(with_mean=False)
X_train = sc.fit_transform(X_train)
下一行失败,因为sc仍然是未触及的
StandardScaler
对象

X_test = sc.transform(X_test)
要使用转换方法,首先必须将
StandardScaler
适配到数据集。如果您的目的是将
StandardScaler
适配到训练集,并使用它将训练集和测试集转换到同一空间,则可以按如下操作:

sc = StandardScaler(with_mean=False)
X_train_sc = sc.fit(X_train)
X_train = X_train_sc.transform(X_train)
X_test = X_train_sc.transform(X_test)

X_train
是一个稀疏矩阵,当您使用大型数据集(如您的案例)时,它非常适合。问题在于,正如所解释的:

使用_mean:boolean,默认为True

如果为True,则在缩放之前将数据居中。这不起作用(并且会 在稀疏矩阵上尝试时引发异常),因为 将它们居中需要构建一个常用的密集矩阵 案例可能太大,无法放入内存

您可以尝试使用_mean=False传递

sc = StandardScaler(with_mean=False)
X_train = sc.fit_transform(X_train)
下一行失败,因为sc仍然是未触及的
StandardScaler
对象

X_test = sc.transform(X_test)
要使用转换方法,首先必须将
StandardScaler
适配到数据集。如果您的目的是将
StandardScaler
适配到训练集,并使用它将训练集和测试集转换到同一空间,则可以按如下操作:

sc = StandardScaler(with_mean=False)
X_train_sc = sc.fit(X_train)
X_train = X_train_sc.transform(X_train)
X_test = X_train_sc.transform(X_test)

您是否尝试了错误消息建议的修复方法?
sc=StandardScalar(使用\u mean=False)
?您是否尝试了错误消息建议的修复方法?
sc=StandardScalar(使用\u mean=False)
?回答得真棒!非常感谢您花时间解释。@Tobsecretglay它有帮助!回答得真棒!非常感谢您花时间解释。@Tobsecretglay它有帮助!