Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何根据给定的概率随机选择一行_Python_Pandas - Fatal编程技术网

Python 如何根据给定的概率随机选择一行

Python 如何根据给定的概率随机选择一行,python,pandas,Python,Pandas,我有这样一个数据帧: >>> df = pd.DataFrame([['a',0,0.2],['b',0,0.3], ... ['c',0,0.5], ... ['a',1,0.4],['b',1,0.3],['c',1,0.3], ... ['a',2,0.5],['b',2,0.5]] ... ,columns=['plac

我有这样一个数据帧:

>>> df = pd.DataFrame([['a',0,0.2],['b',0,0.3],
...                    ['c',0,0.5],
...                    ['a',1,0.4],['b',1,0.3],['c',1,0.3],
...                    ['a',2,0.5],['b',2,0.5]]
...                    ,columns=['place','ID','prob'])
>>> df
place   ID  prob
0   a   0   0.20
1   b   0   0.30
2   c   0   0.50
3   a   1   0.40
4   b   1   0.30
5   c   1   0.30
6   a   2   0.50
7   b   2   0.50
place   ID  prob    choice
0   a   0   0.20    1
1   b   0   0.30    0
2   c   0   0.50    0
3   a   1   0.40    0
4   b   1   0.30    1
5   c   1   0.30    0
6   a   2   0.50    1
7   b   2   0.50    0
我想使用prob列作为概率质量分布,随机选择每个ID中的一行。也就是说,我只想选择每个“ID”中的一行。输出如下所示:

>>> df = pd.DataFrame([['a',0,0.2],['b',0,0.3],
...                    ['c',0,0.5],
...                    ['a',1,0.4],['b',1,0.3],['c',1,0.3],
...                    ['a',2,0.5],['b',2,0.5]]
...                    ,columns=['place','ID','prob'])
>>> df
place   ID  prob
0   a   0   0.20
1   b   0   0.30
2   c   0   0.50
3   a   1   0.40
4   b   1   0.30
5   c   1   0.30
6   a   2   0.50
7   b   2   0.50
place   ID  prob    choice
0   a   0   0.20    1
1   b   0   0.30    0
2   c   0   0.50    0
3   a   1   0.40    0
4   b   1   0.30    1
5   c   1   0.30    0
6   a   2   0.50    1
7   b   2   0.50    0

真正的数据帧将有数百万行,因此效率越高越好。谢谢大家!

我们可以使用您的
prob
作为
DataFrame.sample
中的权重。我们必须做的唯一一件事是在
分组中使用此选项,因为我们希望在
位置对每个分组执行此操作:

sample = df.groupby("ID").apply(lambda x: x.sample(weights=x["prob"]))
choices = sample.reset_index(drop=True, level=0).index
df["choice"] = df.index.isin(choices).astype(int)

  place  ID  prob  choice
0     a   0   0.2       0
1     b   0   0.3       1
2     c   0   0.5       0
3     a   1   0.4       1
4     b   1   0.3       0
5     c   1   0.3       0
6     a   2   0.5       0
7     b   2   0.5       1

谢谢,这正是我需要的。。除此之外,我将按ID进行分组,但这是一个次要问题。再次感谢!