Python 如何在SMOTE算法中使用字典对多类输入数据进行不同的重采样?
我想使用python中的SMOTE算法,使用库Python 如何在SMOTE算法中使用字典对多类输入数据进行不同的重采样?,python,scikit-learn,imblearn,smote,Python,Scikit Learn,Imblearn,Smote,我想使用python中的SMOTE算法,使用库imblearn.over_sampling执行过采样。我的输入数据有四个目标类。我不想对所有少数阶级分布进行过采样,以与多数阶级分布相匹配。我想对我的每一个少数民族班级进行不同的抽样 当我使用SMOTE(采样策略=1,k\u邻居=2,随机状态=1000)时,我得到了以下错误 ValueError: "sampling_strategy" can be a float only when the type of target is
imblearn.over_sampling
执行过采样。我的输入数据有四个目标类。我不想对所有少数阶级分布进行过采样,以与多数阶级分布相匹配。我想对我的每一个少数民族班级进行不同的抽样
当我使用SMOTE(采样策略=1,k\u邻居=2,随机状态=1000)
时,我得到了以下错误
ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict.
然后,根据错误,我使用了“采样策略”的字典,如下所示
SMOTE(sampling_strategy={'1.0':70,'3.0':255,'2.0':50,'0.0':150},k_neighbors=2,random_state = 1000)
但是,它给出了以下错误:
ValueError: The {'2.0', '1.0', '0.0', '3.0'} target class is/are not present in the data.
有人知道如何定义一个字典来使用SMOTE对数据进行不同的过采样吗?您必须为每个类指定所需的样本数,并将此字典传递给SMOTE对象 代码:
import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE
x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)
Counter({0: 100, 1: 65, 2: 25, 3: 10})
sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)
Counter({0: 100, 1: 70, 2: 90, 3: 40})
输出:
import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE
x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)
Counter({0: 100, 1: 65, 2: 25, 3: 10})
sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)
Counter({0: 100, 1: 70, 2: 90, 3: 40})
代码:
import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE
x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)
Counter({0: 100, 1: 65, 2: 25, 3: 10})
sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)
Counter({0: 100, 1: 70, 2: 90, 3: 40})
输出:
import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE
x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)
Counter({0: 100, 1: 65, 2: 25, 3: 10})
sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)
Counter({0: 100, 1: 70, 2: 90, 3: 40})
有关更多信息,请参阅文档
您得到的错误是因为字典中指定的标签与实际标签不匹配