ValueError:无法将字符串转换为浮点SMOTE fit_示例Python过采样

ValueError:无法将字符串转换为浮点SMOTE fit_示例Python过采样,python,oversampling,smote,Python,Oversampling,Smote,我有一个信用风险分析数据集,如下所示: Loan_ID Age Income(LPA) Employed_yr Education Loan_status 1 18 2.4 1 12th 1 2 46 43 26 Post Grad 0 3

我有一个信用风险分析数据集,如下所示:

Loan_ID      Age      Income(LPA)  Employed_yr    Education    Loan_status
 1            18        2.4            1            12th              1
 2            46        43             26           Post Grad         0
 3            22       12              4            Grad              0
 4            25       17              1            Grad              1
1表示违约,0表示贷款状态为非违约

现在,违约的数量非常少约为1000,而非违约的数量为25000。所以我想做过采样或合成采样。

在此之前,代码运行正常

cred_loan = pd.read_csv("Credit_Risk_Analysis.csv")
from imblearn import under_sampling, over_sampling
from imblearn.over_sampling import SMOTE
y= cred_loan.loan_status
X = cred_loan.drop('loan_status', axis=1)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, 
random_state=27)
sm = SMOTE(random_state=27, ratio=1.0)
from sklearn.linear_model import LogisticRegression
在此之后,我执行以下操作,但出现错误

[IN] X_train, y_train = sm.fit_sample(X_train, y_train)        

[OUT]ValueError                                Traceback (most recent call 
last)
<ipython-input-39-0995f82b5705> in <module>
----> 1 X_train, y_train = sm.fit_sample(X_train, y_train)
      2 

~\Anaconda3\lib\site-packages\imblearn\base.py in fit_resample(self, X, y)
     77 
     78         check_classification_targets(y)
---> 79         X, y, binarize_y = self._check_X_y(X, y)
     80 
     81         self.sampling_strategy_ = check_sampling_strategy(

~\Anaconda3\lib\site-packages\imblearn\base.py in _check_X_y(X, y)
    135     def _check_X_y(X, y):
    136         y, binarize_y = check_target_type(y, 
indicate_one_vs_all=True)
--> 137         X, y = check_X_y(X, y, accept_sparse=['csr', 'csc'])
    138         return X, y, binarize_y
    139 

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, 
accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, 
ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, 
y_numeric, warn_on_dtype, estimator)
    754             warnings.warn("A column-vector y was passed when a 1d 
array was"
    755                           " expected. Please change the shape of y 
to "
--> 756                           "(n_samples, ), for example using 
ravel().",
    757                           DataConversionWarning, stacklevel=2)
    758         return np.ravel(y)

~\Anaconda3\lib\site-packages\sklearn\utils\validation.py in 
check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy,         
force_all_finite, ensure_2d, allow_nd, ensure_min_samples, 
ensure_min_features, warn_on_dtype, estimator)
    565     if copy and np.may_share_memory(array, array_orig):
    566         array = np.array(array, dtype=dtype, order=order)
--> 567 
    568     if (warn_on_dtype and dtypes_orig is not None and
    569             {array.dtype} != set(dtypes_orig)):

ValueError: could not convert string to float: 'MORTGAGE'
[IN]X\u-train,y\u-train=sm.拟合样本(X\u-train,y\u-train)
[OUT]ValueError回溯(最近一次呼叫
最后)
在里面
---->1个X_序列,y_序列=sm.拟合样本(X_序列,y_序列)
2.
~\Anaconda3\lib\site packages\imblearn\base.py in fit\u重采样(self,X,y)
77
78检查分类目标(y)
--->79 X,y,二值化_y=self._检查_X_y(X,y)
80
81自我抽样策略=检查抽样策略(
~\Anaconda3\lib\site packages\imblearn\base.py in\u check\u X\u y(X,y)
135 def检查X y(X,y):
136 y,二值化y=检查目标类型(y,
表示“一对一=真”)
-->137 X,y=check_X_y(X,y,accept_sparse=['csr','csc'])
138返回X,y,二值化
139
检查中的~\Anaconda3\lib\site packages\sklearn\utils\validation.py(X,y,
接受稀疏,接受大稀疏,数据类型,顺序,复制,强制所有有限,
确保2d,允许nd,多输出,确保最小样本,确保最小特征,
y\u数字、警告\u数据类型、估计器)
754 warnings.warn(“当1d
数组是“
预计为755英寸。请更改y的形状
"
-->756”(n_样本),例如使用
拉威尔(),
757数据转换警告,堆栈级别=2)
758返回np.ravel(y)
中的~\Anaconda3\lib\site packages\sklearn\utils\validation.py
检查数组(数组、接受稀疏、接受大稀疏、数据类型、顺序、复制、,
强制所有有限,确保2d,允许nd,确保最小样本,
确保\u最小\u功能,警告\u数据类型,估计器)
565如果复制和np可共享内存(数组、数组或原始):
566 array=np.array(array,dtype=dtype,order=order)
--> 567 
568如果(在数据类型和数据类型上发出警告)原始值不是None和
569{array.dtype}!=set(dtypes_orig)):
ValueError:无法将字符串转换为浮动:“抵押”

有人能帮忙吗?

您应该先处理数据,使数据只留下度量数据将列转换为整数形式。您也必须有分类列。尝试预处理它们,然后应用。您应该首先处理数据,使数据只留下度量数据将列转换为整数形式。您必须g分类列。尝试对它们进行预处理,然后应用。