Python 使用groupby在panda数据帧中洗牌列_Python_Pandas

Python 使用groupby在panda数据帧中洗牌列

python pandas

Python 使用groupby在panda数据帧中洗牌列,python,pandas,Python,Pandas,我想基于groupby随机地洗牌数据帧中一列的值。例如，我有两列A和B。现在，我想根据A上的groupby随机洗牌列B 例如，假设A中有三个不同的值。现在，对于A的每个不同值，我想对B中的值进行洗牌，但只使用具有相同A的值输入示例： A B ------------ 1 1 1 3 2 4 3 6 1 2 3 5 示例输出： A B

我想基于groupby随机地洗牌数据帧中一列的值。例如，我有两列A和B。现在，我想根据A上的groupby随机洗牌列B

例如，假设A中有三个不同的值。现在，对于A的每个不同值，我想对B中的值进行洗牌，但只使用具有相同A的值

输入示例：

A       B     
------------
1       1          
1       3    
2       4     
3       6   
1       2  
3       5

示例输出：

A       B        
------------
1       3          
1       2    
2       4     
3       6   
1       1  
3       5

在这种情况下，对于

A=1

，B的值被洗牌。对于

A=2

，情况也是如此，但因为只有一行，所以它保持原样。对于

A=3

，B的值也碰巧保持不变

我想用Pandas实现它。

为此，您可以将

np.random.permutation

（返回数组的无序版本）与

groupby

和

transform

（返回组的相似索引版本）结合起来。例如：

>>> df
   col1  col2
0     1     1
1     1     3
2     2     4
3     3     6
4     1     2
5     3     5
>>> df["col3"] = df.groupby("col1")["col2"].transform(np.random.permutation)
>>> df
   col1  col2  col3
0     1     1     2
1     1     3     1
2     2     4     4
3     3     6     5
4     1     2     3
5     3     5     6

请注意，这些值仅在其col1组中被洗牌。

您还可以将

groupby

与

sample

一起使用：

df = pd.DataFrame({'col1': [1, 1, 2, 3, 1, 3], 
                   'col2': [1, 3, 4, 6, 2, 5]})

df_rand = df.groupby('col1').apply(lambda x: x.sample(frac=1)).reset_index(drop=True)

>>> df.sort('col1')
   col1  col2
0     1     1
1     1     3
4     1     2
2     2     4
3     3     6
5     3     5

>>> df_rand
   col1  col2
0     1     2
1     1     3
2     1     1
3     2     4
4     3     6
5     3     5

你能提供样品数据和预期输出吗？当然，我举了个例子，就是这样。事实上，我非常接近你的解决方案：）谢谢！如果您只想洗牌一列，那么这是否适用于多列？