Python 将“计算组”列添加到数据框，保留行的初始顺序_Python_Pandas_Pandas Groupby_Apply

Python 将“计算组”列添加到数据框，保留行的初始顺序

python pandas

Python 将“计算组”列添加到数据框，保留行的初始顺序,python,pandas,pandas-groupby,apply,Python,Pandas,Pandas Groupby,Apply,Python 3.8.5、1.1.4 test_df = pd.DataFrame({ 'key': ['a', 'c', 'a', 'b', 'c', 'b', 'b', 'a', 'c', 'a'], 'id': [859, 849, 238, 977, 427, 760, 453, 664, 102, 128], 'order': [92, 32, 60, 4, 18, 19, 43, 69, 14, 88], 'value': [18, 12, 16, 7

Python 3.8.5、1.1.4

test_df = pd.DataFrame({
    'key': ['a', 'c', 'a', 'b', 'c', 'b', 'b', 'a', 'c', 'a'],
    'id': [859, 849, 238, 977, 427, 760, 453, 664, 102, 128],
    'order': [92, 32, 60, 4, 18, 19, 43, 69, 14, 88],
    'value': [18, 12, 16, 77, 62, 93, 86, 14, 49, 89]
})

test_df

    key id      order   value
0   a   859     92      18
1   c   849     32      12
2   a   238     60      16
3   b   977     4       77
4   c   427     18      62
5   b   760     19      93
6   b   453     43      86
7   a   664     69      14
8   c   102     14      49
9   a   128     88      89

我需要计算键组中每一行相对于order列的一些特征，保持行的原始顺序，并将计算出的特征添加到原始数据帧中

我的方法如下：

def add_columns(r, d):
    new_r = r.copy()
    new_r['total'] = d.query('order < @r.order')['value'].sum()
    new_r['check'] = any(r.value > d.query('order < @r.order')['value'])
    return new_r

test_df.groupby('key').apply(lambda df: df.apply(lambda row: add_columns(row, df), axis=1))

    key id  order   value   total   check
0   a   859 92      18      119     True
1   c   849 32      12      111     False
2   a   238 60      16      0       False
3   b   977 4       77      0       False
4   c   427 18      62      49      True
5   b   760 19      93      77      True
6   b   453 43      86      170     True
7   a   664 69      14      16      False
8   c   102 14      49      0       False
9   a   128 88      89      30      True

def add_列（r、d）：
新建\u r=r.copy（）
new_r['total']=d.query（'order<@r.order'）['value'].sum（）
new_r['check']=any（r.value>d.query（'order<@r.order'）['value']））
返回新的\u r
test_df.groupby（'key'）.apply（lambda-df:df.apply（lambda-row:add_-columns（row，df），axis=1））
密钥id订单值合计检查
0 a 859 92 18 119正确
1 c 849 32 12 111错误
2 a 238 60 16 0错误
3B9774770错误
4 c 427 18 62 49正确
5 b 760 19 93 77正确
6B4534386170正确
7 a 664 69 14 16错误
8 c 102 14 49 0错误
9 a 128 88 89 30正确

是否有更干净的python本机或pandas本机方法？我的代码看起来很混乱。

让我们尝试自我合并和查询，然后您可以通过以下方式进行分组：

df.join(df.reset_index().merge(df, on='key')
   .query('order_x > order_y')    # x will play role of `r` and `y` of `d` in your code
   .assign(check=lambda x: x['value_x']>x['value_y'])
   .groupby('index')
   .agg(total=('value_y','sum'), check=('check','any'))
   .reindex(df.index, fill_value=0)
)

输出：

  key   id  order  value  total  check
0   a  859     92     18    119   True
1   c  849     32     12    111  False
2   a  238     60     16      0      0
3   b  977      4     77      0      0
4   c  427     18     62     49   True
5   b  760     19     93     77   True
6   b  453     43     86    170   True
7   a  664     69     14     16  False
8   c  102     14     49      0      0
9   a  128     88     89     30   True

另一个版本：

g = test_df.groupby("key")

test_df["total"] = g["order"].transform(
    lambda x: [test_df.loc[x[x < v].index, "value"].sum() for v in x]
)
test_df["check"] = g["order"].transform(
    lambda x: [
        (test_df.loc[i, "value"] > test_df.loc[x[x < v].index, "value"]).any()
        for i, v in zip(x.index, x)
    ]
)
print(test_df)