Python 如何根据给定的概率随机选择一行
我有这样一个数据帧:Python 如何根据给定的概率随机选择一行,python,pandas,Python,Pandas,我有这样一个数据帧: >>> df = pd.DataFrame([['a',0,0.2],['b',0,0.3], ... ['c',0,0.5], ... ['a',1,0.4],['b',1,0.3],['c',1,0.3], ... ['a',2,0.5],['b',2,0.5]] ... ,columns=['plac
>>> df = pd.DataFrame([['a',0,0.2],['b',0,0.3],
... ['c',0,0.5],
... ['a',1,0.4],['b',1,0.3],['c',1,0.3],
... ['a',2,0.5],['b',2,0.5]]
... ,columns=['place','ID','prob'])
>>> df
place ID prob
0 a 0 0.20
1 b 0 0.30
2 c 0 0.50
3 a 1 0.40
4 b 1 0.30
5 c 1 0.30
6 a 2 0.50
7 b 2 0.50
place ID prob choice
0 a 0 0.20 1
1 b 0 0.30 0
2 c 0 0.50 0
3 a 1 0.40 0
4 b 1 0.30 1
5 c 1 0.30 0
6 a 2 0.50 1
7 b 2 0.50 0
我想使用prob列作为概率质量分布,随机选择每个ID中的一行。也就是说,我只想选择每个“ID”中的一行。输出如下所示:
>>> df = pd.DataFrame([['a',0,0.2],['b',0,0.3],
... ['c',0,0.5],
... ['a',1,0.4],['b',1,0.3],['c',1,0.3],
... ['a',2,0.5],['b',2,0.5]]
... ,columns=['place','ID','prob'])
>>> df
place ID prob
0 a 0 0.20
1 b 0 0.30
2 c 0 0.50
3 a 1 0.40
4 b 1 0.30
5 c 1 0.30
6 a 2 0.50
7 b 2 0.50
place ID prob choice
0 a 0 0.20 1
1 b 0 0.30 0
2 c 0 0.50 0
3 a 1 0.40 0
4 b 1 0.30 1
5 c 1 0.30 0
6 a 2 0.50 1
7 b 2 0.50 0
真正的数据帧将有数百万行,因此效率越高越好。谢谢大家! 我们可以使用您的
prob
作为DataFrame.sample
中的权重。我们必须做的唯一一件事是在分组中使用此选项,因为我们希望在位置对每个分组执行此操作:
sample = df.groupby("ID").apply(lambda x: x.sample(weights=x["prob"]))
choices = sample.reset_index(drop=True, level=0).index
df["choice"] = df.index.isin(choices).astype(int)
place ID prob choice
0 a 0 0.2 0
1 b 0 0.3 1
2 c 0 0.5 0
3 a 1 0.4 1
4 b 1 0.3 0
5 c 1 0.3 0
6 a 2 0.5 0
7 b 2 0.5 1
谢谢,这正是我需要的。。除此之外,我将按ID进行分组,但这是一个次要问题。再次感谢!