Python 在groupby中设置值_Python_Numpy_Pandas

Python 在groupby中设置值

python numpy pandas

Python 在groupby中设置值,python,numpy,pandas,Python,Numpy,Pandas,我有一个数据帧 >>> df = pd.DataFrame({ ... 'letters' : ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'], ... 'is_min' : np.zeros(9), ... 'numbers' : np.random.randn(9) ... }) is_min letters numbers 0 0 a

我有一个数据帧

>>> df = pd.DataFrame({
...            'letters' : ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'], 
...            'is_min' : np.zeros(9),
...            'numbers' : np.random.randn(9)
... })

    is_min  letters numbers
0   0       a       0.322499
1   0       a      -0.196617
2   0       a      -1.194251
3   0       b       1.005323
4   0       b      -0.186364
5   0       b      -1.886273
6   0       c       0.014960
7   0       c      -0.832713
8   0       c       0.689531

如果“数字”是“字母”列的最小值，我想将“is_min”列设置为1。我已经试过了，觉得我很接近

>>> df.groupby('letters')['numbers'].transform('idxmin')

0    2
1    2
2    2
3    5
4    5
5    5
6    7
7    7
8    7
dtype: int64

我很难连接点以将“is_min”的val设置为1。

将行标签传递到

loc

并设置列：

In [34]:
df.loc[df.groupby('letters')['numbers'].transform('idxmin'), 'is_min']=1
df

Out[34]:
   is_min letters   numbers
0       1       a -0.374751
1       0       a  1.663334
2       0       a -0.123599
3       1       b -2.156204
4       0       b  0.201493
5       0       b  1.639512
6       0       c -0.447271
7       0       c  0.017204
8       1       c -1.261621

因此，这里发生的事情是，通过调用

loc

我们只选择

transform

方法返回的行，并根据需要将它们设置为

不确定这是否重要，但您可以调用

unique

，这样您就可以在不重复的情况下获得行标签，这可能会更快：

df.loc[df.groupby('letters')['numbers'].transform('idxmin').unique(), 'is_min']=1

如果“数字”是“字母”列的最小值，我想将“is_min”列设置为1

也许更直观的方法是计算每组

字母的最小值，然后使用分组。应用分配is_min
：
def set_is_min(m):
   df.loc[df.numbers == m, 'is_min'] = 1
mins = df.groupby('letters').numbers.min().apply(set_is_min)

在大型数据帧中，此方法实际上比使用transform快20%：
# timeit with 100'000 rows
# .apply on group minima
100 loops, best of 3: 16.7 ms per loop
# .transform
10 loops, best of 3: 21.9 ms per loop

我使用apply和transform运行了一些不同的方法。
谢谢。这似乎可以在没有重复值的情况下工作，然后调用unique。df.loc[df.groupby（'letters'）['numbers'].idxmin（）.values'is_min']=1