Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/304.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在SMOTE算法中使用字典对多类输入数据进行不同的重采样?_Python_Scikit Learn_Imblearn_Smote - Fatal编程技术网

Python 如何在SMOTE算法中使用字典对多类输入数据进行不同的重采样?

Python 如何在SMOTE算法中使用字典对多类输入数据进行不同的重采样?,python,scikit-learn,imblearn,smote,Python,Scikit Learn,Imblearn,Smote,我想使用python中的SMOTE算法,使用库imblearn.over_sampling执行过采样。我的输入数据有四个目标类。我不想对所有少数阶级分布进行过采样,以与多数阶级分布相匹配。我想对我的每一个少数民族班级进行不同的抽样 当我使用SMOTE(采样策略=1,k\u邻居=2,随机状态=1000)时,我得到了以下错误 ValueError: "sampling_strategy" can be a float only when the type of target is

我想使用python中的SMOTE算法,使用库
imblearn.over_sampling
执行过采样。我的输入数据有四个目标类。我不想对所有少数阶级分布进行过采样,以与多数阶级分布相匹配。我想对我的每一个少数民族班级进行不同的抽样

当我使用
SMOTE(采样策略=1,k\u邻居=2,随机状态=1000)
时,我得到了以下错误

ValueError: "sampling_strategy" can be a float only when the type of target is binary. For multi-class, use a dict.
然后,根据错误,我使用了“采样策略”的字典,如下所示

SMOTE(sampling_strategy={'1.0':70,'3.0':255,'2.0':50,'0.0':150},k_neighbors=2,random_state = 1000)
但是,它给出了以下错误:

ValueError: The {'2.0', '1.0', '0.0', '3.0'} target class is/are not present in the data.

有人知道如何定义一个字典来使用SMOTE对数据进行不同的过采样吗?

您必须为每个类指定所需的样本数,并将此字典传递给SMOTE对象

代码:

import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE

x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)
Counter({0: 100, 1: 65, 2: 25, 3: 10})
sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)
Counter({0: 100, 1: 70, 2: 90, 3: 40})
输出:

import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE

x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)
Counter({0: 100, 1: 65, 2: 25, 3: 10})
sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)
Counter({0: 100, 1: 70, 2: 90, 3: 40})
代码:

import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE

x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)
Counter({0: 100, 1: 65, 2: 25, 3: 10})
sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)
Counter({0: 100, 1: 70, 2: 90, 3: 40})
输出:

import numpy as np
from collections import Counter
from imblearn.over_sampling import SMOTE

x1 = np.random.randint(500, size =(200,13))
y1 = np.concatenate([np.array([0]*100), np.array([1]*65), np.array([2]*25), np.array([3]*10)])
np.random.shuffle(y1)
Counter(y1)
Counter({0: 100, 1: 65, 2: 25, 3: 10})
sm = SMOTE(sampling_strategy = {0: 100, 1: 70, 2: 90, 3: 40})
X_res, y_res = sm.fit_resample(x1, y1)
Counter(y_res)
Counter({0: 100, 1: 70, 2: 90, 3: 40})
有关更多信息,请参阅文档

您得到的错误是因为字典中指定的标签与实际标签不匹配