在python中应用函数groupwise_Python_Pandas_Numpy

在python中应用函数groupwise

python pandas numpy

在python中应用函数groupwise,python,pandas,numpy,Python,Pandas,Numpy,如何将函数groupwise应用于数据帧；函数应用于子组，但子组在不同父组之间重复例如： | Parent Group | Child Group | Value | -------------------------------------- | A | I1 | V1 | ----------------------------------- | A | I1 | V2 | ------------------

如何将函数groupwise应用于数据帧；函数应用于子组，但子组在不同父组之间重复

例如：

| Parent Group | Child Group | Value |
--------------------------------------
|  A           | I1          | V1 |
-----------------------------------
|  A           | I1          | V2 |
-----------------------------------
|  A           | I2          | V3 |
-----------------------------------
|  A           | I2          | V4 |
-----------------------------------
|  B           | I1          | V5 |
-----------------------------------
|  B           | I1          | V6 |
-----------------------------------
|  B           | I2          | V7 |
-----------------------------------
|  B           | I2          | V8 |
-----------------------------------

预期产出：

| Parent Group | Child Group | Value     |
------------------------------------------
|  A           | I1          | f(V1, V2) |
------------------------------------------
|  A           | I2          | f(V3, V4) |
------------------------------------------
|  B           | I1          | f(V5, V6) |
------------------------------------------
|  B           | I2          | f(V7, V8) |
------------------------------------------

我可以通过将父组密钥与子组密钥（例如，['A_I1'，'A_I2'）组合，使子组唯一，然后应用函数：

df.groupby('Unique Child Group').apply(f)

但我想知道是否有更优雅的方法

您可以这样做：

df.groupby(['Parent Group', 'Child Group'])['Value'].apply(lambda x: ', '.join(x))

输出：

              Parent Group  Child Group
A             I1             V1, V2
              I2             V3, V4
B             I1             V5, V6
              I2             V7, V8

              Parent Group  Child Group
A             I1             f(V1, V2)
              I2             f(V3, V4)
B             I1             f(V5, V6)
              I2             f(V7, V8)

如果要使用任何字符串格式来更改输出值，可以通过以下方式执行：

df.groupby(['Parent Group', 'Child Group'])['Value'].apply(lambda x: "f(%s)" % ', '.join(x))

输出：

              Parent Group  Child Group
A             I1             V1, V2
              I2             V3, V4
B             I1             V5, V6
              I2             V7, V8

              Parent Group  Child Group
A             I1             f(V1, V2)
              I2             f(V3, V4)
B             I1             f(V5, V6)
              I2             f(V7, V8)

假设：每组始终有2行

设置

df = pd.DataFrame({'Child Group': {0: 'I1', 1: 'I1',  2: 'I2',  3: 'I2',  4: 'I1',  5: 'I1',  6: 'I2',  7: 'I2'}, 'Parent Group': {0: 'A',  1: 'A',  2: 'A',  3: 'A',  4: 'B',  5: 'B',  6: 'B',  7: 'B'}, 'Value': {0: 'V1', 1: 'V2',  2: 'V3',  3: 'V4',  4: 'V5',  5: 'V6',  6: 'V7',  7: 'V8'}})

Out[1305]: 
  Child Group Parent Group Value
0          I1            A    V1
1          I1            A    V2
2          I2            A    V3
3          I2            A    V4
4          I1            B    V5
5          I1            B    V6
6          I2            B    V7
7          I2            B    V8

演示

def func(x,y):
    return x+y

#group by Parent Group and Child group, the first value can be reference by x.iloc[0]['Value'] 
#and the second value can be referenced by x.iloc[-1]['Value']. 
#Below is an example to call a function to concatenate the two values.
df.groupby(['Parent Group','Child Group']).apply(lambda x: func(x.iloc[0]['Value'],x.iloc[-1]['Value']))
Out[1304]: 
Parent Group  Child Group
A             I1             V1V2
              I2             V3V4
B             I1             V5V6
              I2             V7V8

df.groupby（['Parent Group'，'Child Group']）。应用（f）

？