Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/276.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python IMBRearn管道和pipeline之间的区别_Python_Machine Learning_Scikit Learn_Pipeline_Imbalanced Data - Fatal编程技术网

Python IMBRearn管道和pipeline之间的区别

Python IMBRearn管道和pipeline之间的区别,python,machine-learning,scikit-learn,pipeline,imbalanced-data,Python,Machine Learning,Scikit Learn,Pipeline,Imbalanced Data,我想使用sklearn.pipeline而不是使用imblearn.pipeline来合并'RandomUnderSampler()。我的原始数据需要缺失值插补和缩放。这里我有乳腺癌数据作为一个玩具例子。但是,它给了我以下错误消息。我感谢你的建议。谢谢你的时间 from numpy.random import seed seed(12) from sklearn.datasets import load_breast_cancer import time from sklearn.metrics

我想使用
sklearn.pipeline
而不是使用
imblearn.pipeline
来合并'RandomUnderSampler()。我的原始数据需要缺失值插补和缩放。这里我有乳腺癌数据作为一个玩具例子。但是,它给了我以下错误消息。我感谢你的建议。谢谢你的时间

from numpy.random import seed
seed(12)
from sklearn.datasets import load_breast_cancer
import time
from sklearn.metrics import make_scorer
from imblearn.metrics import geometric_mean_score
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_validate
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import MaxAbsScaler
from imblearn.under_sampling import RandomUnderSampler
gmean = make_scorer(geometric_mean_score, greater_is_better=True)

X, y = load_breast_cancer(return_X_y=True)
start_time1 = time.time()
scoring = {'G-mean': gmean}
LR_pipe =  Pipeline([("impute", SimpleImputer(strategy='constant',fill_value= 0)),("scale", MaxAbsScaler()),("rus", RandomUnderSampler()),("LR", LogisticRegression(solver='lbfgs', random_state=0, class_weight='balanced', max_iter=100000))])
LRscores = cross_validate(LR_pipe,X, y, cv=5,scoring=scoring)
end_time1 = time.time()
print ("Computational time in seconds = " +str(end_time1 - start_time1) )
sorted(LRscores.keys())
LR_Gmean = LRscores['test_G-mean'].mean()

print("G-mean: %f" % (LR_Gmean))
错误消息:

TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'RandomUnderSampler()' (type <class 'imblearn.under_sampling._prototype_selection._random_under_sampler.RandomUnderSampler'>) doesn't

TypeError:所有中间步骤都应该是transformers,并实现fit和transform,或者是字符串“passthrough”“RandomUnderSampler()”(type)不是

我们应该从
imblearn.pipeline
导入
make_pipeline
,而不是从
sklearn.pipeline
:从sklearn导入
make_pipeline
需要转换器来实现
fit
transform
方法<代码>sklearn.pipeline导入管道与
IMBRearn.pipeline
导入管道冲突

这正是为什么
imblearn
有自己的
管道版本的原因。你为什么不想使用它?@BenReiniger,因为我无法在
imberearn
管道中包含
SimpleImputer
maxabscaler
。我不知道我错过了什么!不应该有问题,包括那些;我建议发布一个新的问题。