Python 如何在SMOTE算法中使用字典对多类输入数据进行不同的重采样？_Python_Scikit Learn_Imblearn_Smote

Python 如何在SMOTE算法中使用字典对多类输入数据进行不同的重采样？

python scikit-learn

Python 如何在SMOTE算法中使用字典对多类输入数据进行不同的重采样？,python,scikit-learn,imblearn,smote,Python,Scikit Learn,Imblearn,Smote,我想使用python中的SMOTE算法，使用库imblearn.over_sampling执行过采样。我的输入数据有四个目标类。我不想对所有少数阶级分布进行过采样，以与多数阶级分布相匹配。我想对我的每一个少数民族班级进行不同的抽样当我使用SMOTE（采样策略=1，k\u邻居=2，随机状态=1000）时，我得到了以下错误 ValueError: "sampling_strategy" can be a float only when the type of target is

我想使用python中的SMOTE算法，使用库

imblearn.over_sampling

执行过采样。我的输入数据有四个目标类。我不想对所有少数阶级分布进行过采样，以与多数阶级分布相匹配。我想对我的每一个少数民族班级进行不同的抽样

当我使用

SMOTE（采样策略=1，k\u邻居=2，随机状态=1000）

时，我得到了以下错误

ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict.

然后，根据错误，我使用了“采样策略”的字典，如下所示

SMOTE(sampling_strategy={'1.0':70,'3.0':255,'2.0':50,'0.0':150},k_neighbors=2,random_state = 1000)

但是，它给出了以下错误：

ValueError: The {'2.0', '1.0', '0.0', '3.0'} target class is/are not present in the data.

有人知道如何定义一个字典来使用SMOTE对数据进行不同的过采样吗？

您必须为每个类指定所需的样本数，并将此字典传递给SMOTE对象

代码：

import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE

x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)

Counter({0: 100, 1: 65, 2: 25, 3: 10})

sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)

Counter({0: 100, 1: 70, 2: 90, 3: 40})

输出：

import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE

x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)

Counter({0: 100, 1: 65, 2: 25, 3: 10})

sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)

Counter({0: 100, 1: 70, 2: 90, 3: 40})

代码：

import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE

x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)

Counter({0: 100, 1: 65, 2: 25, 3: 10})

sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)

Counter({0: 100, 1: 70, 2: 90, 3: 40})

输出：

import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE

x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)

Counter({0: 100, 1: 65, 2: 25, 3: 10})

sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)

Counter({0: 100, 1: 70, 2: 90, 3: 40})

有关更多信息，请参阅文档

您得到的错误是因为字典中指定的标签与实际标签不匹配