Python 如何用NumPy生成具有条件概率的x和y样本_Python_Numpy_Numpy Ndarray

Python 如何用NumPy生成具有条件概率的x和y样本

python numpy

Python 如何用NumPy生成具有条件概率的x和y样本,python,numpy,numpy-ndarray,Python,Numpy,Numpy Ndarray,我试图为二进制分类器生成一个x及其标签的样本-y 我知道我的x在[0,1]中是均匀分布的。但是我的y的分布是由我的x得出的：如果x在[0.2,0.4]或[0.6,0.8]中-P[Y=1]=0.1。如果x在这些界限之外，那么P[Y=1]=0.8 我认为最好的方法是使用NumPy，而不是使用for循环和if条件，但直到现在我都没有成功这是我的尝试： s = np.random.uniform(0,1,100) # 100 x samples in [0,1] uniformly distribu

我试图为二进制分类器生成一个x及其标签的样本-y

我知道我的x在[0,1]中是均匀分布的。但是我的y的分布是由我的x得出的：如果x在[0.2,0.4]或[0.6,0.8]中-P[Y=1]=0.1。如果x在这些界限之外，那么P[Y=1]=0.8

我认为最好的方法是使用NumPy，而不是使用for循环和if条件，但直到现在我都没有成功

这是我的尝试：

s = np.random.uniform(0,1,100) # 100 x samples in [0,1] uniformly distributed
condition  = (np.logical_or((s>0.2)&(s < 0.4), (s>0.6)&(s < 0.8))) # attempt to mark with True the places of x in bounds.
x_in_bounds = np.select(condlist, s) # this line doesn't work
... # how to generate the y values?

我试图找到一种根据x值样本的条件随机生成y值的方法，但没有成功。我很想了解我遗漏了什么。

一种方法是生成两个随机序列，根据上述两种情况填充1或0。然后使用np.where根据条件从一个或另一个选项中进行选择：

与您使用的方法相同的解决方案是：

generate = lambda prob: 1 if np.random.rand() < prob else 0

s = np.random.uniform(0, 1, 100)
low_prob_condition = ((s > 0.2) & (s < 0.4)) | ((s > 0.6) & (s < 0.8))
condlist = [low_prob_condition, np.logical_not(low_prob_condition)] 
labels = np.select(condlist, [[generate(0.1) for _ in range(s.size)], [generate(0.8) for _ in range(s.size)]])

print(labels)

但更多节省时间和空间的解决方案将是：

s = np.random.uniform(0, 1, 100)
low_prob_cond = lambda x: ((x > 0.2) and (x < 0.4)) or ((x > 0.6) and (x < 0.8))
gen = lambda prob: 1 if np.random.rand() < prob else 0
labels = (gen(0.1) if low_prob_cond(x) else gen(0.8) for x in s)

print(list(labels))

np.select需要一个大小相同的列表，其中s作为案例2中每个条件的选择列表，这显然可以在您的问题中避免。

有关使用您的方法的解决方案，请参阅@adnanmuttaleb的答案

我的方法是使用numpy的高级索引：

x = np.random.uniform(0, 1, 100)

cond = ((x > 0.2) & (x < 0.4)) | ((x > 0.6) & (x < 0.8))
not_cond = np.logical_not(cond)

y = np.random.rand(*x.shape)
y[cond] = y[cond] < 0.1
y[not_cond] = y[not_cond] < 0.8
y = y.astype(int)

修正输入错误->在第一个解决方案中生成。我认为最好是在随机数中添加种子，以证明这些解决方案是等价的，并产生相同的结果。除非分类顺序不一样，然后就不相关了…@idow09 tnx打字错误修复。我的答案有用吗？别忘了你可以投票并接受答案。看，谢谢！

s = np.random.uniform(0, 1, 100)
low_prob_cond = lambda x: ((x > 0.2) and (x < 0.4)) or ((x > 0.6) and (x < 0.8))
gen = lambda prob: 1 if np.random.rand() < prob else 0
labels = (gen(0.1) if low_prob_cond(x) else gen(0.8) for x in s)

print(list(labels))

[0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1]

x = np.random.uniform(0, 1, 100)

cond = ((x > 0.2) & (x < 0.4)) | ((x > 0.6) & (x < 0.8))
not_cond = np.logical_not(cond)

y = np.random.rand(*x.shape)
y[cond] = y[cond] < 0.1
y[not_cond] = y[not_cond] < 0.8
y = y.astype(int)