在python中应用函数groupwise
如何将函数groupwise应用于数据帧;函数应用于子组,但子组在不同父组之间重复 例如:在python中应用函数groupwise,python,pandas,numpy,Python,Pandas,Numpy,如何将函数groupwise应用于数据帧;函数应用于子组,但子组在不同父组之间重复 例如: | Parent Group | Child Group | Value | -------------------------------------- | A | I1 | V1 | ----------------------------------- | A | I1 | V2 | ------------------
| Parent Group | Child Group | Value |
--------------------------------------
| A | I1 | V1 |
-----------------------------------
| A | I1 | V2 |
-----------------------------------
| A | I2 | V3 |
-----------------------------------
| A | I2 | V4 |
-----------------------------------
| B | I1 | V5 |
-----------------------------------
| B | I1 | V6 |
-----------------------------------
| B | I2 | V7 |
-----------------------------------
| B | I2 | V8 |
-----------------------------------
预期产出:
| Parent Group | Child Group | Value |
------------------------------------------
| A | I1 | f(V1, V2) |
------------------------------------------
| A | I2 | f(V3, V4) |
------------------------------------------
| B | I1 | f(V5, V6) |
------------------------------------------
| B | I2 | f(V7, V8) |
------------------------------------------
我可以通过将父组密钥与子组密钥(例如,['A_I1','A_I2')组合,使子组唯一,然后应用函数:
df.groupby('Unique Child Group').apply(f)
但我想知道是否有更优雅的方法 您可以这样做:
df.groupby(['Parent Group', 'Child Group'])['Value'].apply(lambda x: ', '.join(x))
输出:
Parent Group Child Group
A I1 V1, V2
I2 V3, V4
B I1 V5, V6
I2 V7, V8
Parent Group Child Group
A I1 f(V1, V2)
I2 f(V3, V4)
B I1 f(V5, V6)
I2 f(V7, V8)
如果要使用任何字符串格式来更改输出值,可以通过以下方式执行:
df.groupby(['Parent Group', 'Child Group'])['Value'].apply(lambda x: "f(%s)" % ', '.join(x))
输出:
Parent Group Child Group
A I1 V1, V2
I2 V3, V4
B I1 V5, V6
I2 V7, V8
Parent Group Child Group
A I1 f(V1, V2)
I2 f(V3, V4)
B I1 f(V5, V6)
I2 f(V7, V8)
假设:每组始终有2行 设置
df = pd.DataFrame({'Child Group': {0: 'I1', 1: 'I1', 2: 'I2', 3: 'I2', 4: 'I1', 5: 'I1', 6: 'I2', 7: 'I2'}, 'Parent Group': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'B', 5: 'B', 6: 'B', 7: 'B'}, 'Value': {0: 'V1', 1: 'V2', 2: 'V3', 3: 'V4', 4: 'V5', 5: 'V6', 6: 'V7', 7: 'V8'}})
Out[1305]:
Child Group Parent Group Value
0 I1 A V1
1 I1 A V2
2 I2 A V3
3 I2 A V4
4 I1 B V5
5 I1 B V6
6 I2 B V7
7 I2 B V8
演示
def func(x,y):
return x+y
#group by Parent Group and Child group, the first value can be reference by x.iloc[0]['Value']
#and the second value can be referenced by x.iloc[-1]['Value'].
#Below is an example to call a function to concatenate the two values.
df.groupby(['Parent Group','Child Group']).apply(lambda x: func(x.iloc[0]['Value'],x.iloc[-1]['Value']))
Out[1304]:
Parent Group Child Group
A I1 V1V2
I2 V3V4
B I1 V5V6
I2 V7V8
df.groupby(['Parent Group','Child Group'])。应用(f)
?