ValueError:无法将字符串转换为浮点SMOTE fit_示例Python过采样
我有一个信用风险分析数据集,如下所示:ValueError:无法将字符串转换为浮点SMOTE fit_示例Python过采样,python,oversampling,smote,Python,Oversampling,Smote,我有一个信用风险分析数据集,如下所示: Loan_ID Age Income(LPA) Employed_yr Education Loan_status 1 18 2.4 1 12th 1 2 46 43 26 Post Grad 0 3
Loan_ID Age Income(LPA) Employed_yr Education Loan_status
1 18 2.4 1 12th 1
2 46 43 26 Post Grad 0
3 22 12 4 Grad 0
4 25 17 1 Grad 1
1表示违约,0表示贷款状态为非违约
现在,违约的数量非常少约为1000,而非违约的数量为25000。所以我想做过采样或合成采样。
在此之前,代码运行正常
cred_loan = pd.read_csv("Credit_Risk_Analysis.csv")
from imblearn import under_sampling, over_sampling
from imblearn.over_sampling import SMOTE
y= cred_loan.loan_status
X = cred_loan.drop('loan_status', axis=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25,
random_state=27)
sm = SMOTE(random_state=27, ratio=1.0)
from sklearn.linear_model import LogisticRegression
在此之后,我执行以下操作,但出现错误
[IN] X_train, y_train = sm.fit_sample(X_train, y_train)
[OUT]ValueError Traceback (most recent call
last)
<ipython-input-39-0995f82b5705> in <module>
----> 1 X_train, y_train = sm.fit_sample(X_train, y_train)
2
~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
77
78 check_classification_targets(y)
---> 79 X, y, binarize_y = self._check_X_y(X, y)
80
81 self.sampling_strategy_ = check_sampling_strategy(
~\Anaconda3\lib\site-packages\imblearn\base.py in _check_X_y(X, y)
135 def _check_X_y(X, y):
136 y, binarize_y = check_target_type(y,
indicate_one_vs_all=True)
--> 137 X, y = check_X_y(X, y, accept_sparse=['csr', 'csc'])
138 return X, y, binarize_y
139
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y,
accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite,
ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features,
y_numeric, warn_on_dtype, estimator)
754 warnings.warn("A column-vector y was passed when a 1d
array was"
755 " expected. Please change the shape of y
to "
--> 756 "(n_samples, ), for example using
ravel().",
757 DataConversionWarning, stacklevel=2)
758 return np.ravel(y)
~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in
check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy,
force_all_finite, ensure_2d, allow_nd, ensure_min_samples,
ensure_min_features, warn_on_dtype, estimator)
565 if copy and np.may_share_memory(array, array_orig):
566 array = np.array(array, dtype=dtype, order=order)
--> 567
568 if (warn_on_dtype and dtypes_orig is not None and
569 {array.dtype} != set(dtypes_orig)):
ValueError: could not convert string to float: 'MORTGAGE'
[IN]X\u-train,y\u-train=sm.拟合样本(X\u-train,y\u-train)
[OUT]ValueError回溯(最近一次呼叫
最后)
在里面
---->1个X_序列,y_序列=sm.拟合样本(X_序列,y_序列)
2.
~\Anaconda3\lib\site packages\imblearn\base.py in fit\u重采样(self,X,y)
77
78检查分类目标(y)
--->79 X,y,二值化_y=self._检查_X_y(X,y)
80
81自我抽样策略=检查抽样策略(
~\Anaconda3\lib\site packages\imblearn\base.py in\u check\u X\u y(X,y)
135 def检查X y(X,y):
136 y,二值化y=检查目标类型(y,
表示“一对一=真”)
-->137 X,y=check_X_y(X,y,accept_sparse=['csr','csc'])
138返回X,y,二值化
139
检查中的~\Anaconda3\lib\site packages\sklearn\utils\validation.py(X,y,
接受稀疏,接受大稀疏,数据类型,顺序,复制,强制所有有限,
确保2d,允许nd,多输出,确保最小样本,确保最小特征,
y\u数字、警告\u数据类型、估计器)
754 warnings.warn(“当1d
数组是“
预计为755英寸。请更改y的形状
"
-->756”(n_样本),例如使用
拉威尔(),
757数据转换警告,堆栈级别=2)
758返回np.ravel(y)
中的~\Anaconda3\lib\site packages\sklearn\utils\validation.py
检查数组(数组、接受稀疏、接受大稀疏、数据类型、顺序、复制、,
强制所有有限,确保2d,允许nd,确保最小样本,
确保\u最小\u功能,警告\u数据类型,估计器)
565如果复制和np可共享内存(数组、数组或原始):
566 array=np.array(array,dtype=dtype,order=order)
--> 567
568如果(在数据类型和数据类型上发出警告)原始值不是None和
569{array.dtype}!=set(dtypes_orig)):
ValueError:无法将字符串转换为浮动:“抵押”
有人能帮忙吗?您应该先处理数据,使数据只留下度量数据将列转换为整数形式。您也必须有分类列。尝试预处理它们,然后应用。您应该首先处理数据,使数据只留下度量数据将列转换为整数形式。您必须g分类列。尝试对它们进行预处理,然后应用。