Python 对数据帧应用多个函数_Python_Pandas

Python 对数据帧应用多个函数

python pandas

Python 对数据帧应用多个函数,python,pandas,Python,Pandas,我正在寻找一种方法，从我的原始数据中集成多个apply函数。下面是一些简化的代码 import pandas as pd df = pd.DataFrame({'name':["alice","bob","charlene","alice","bob","charlene","alice","bob","charlene&q

我正在寻找一种方法，从我的原始数据中集成多个apply函数。下面是一些简化的代码

import pandas as pd 

df = pd.DataFrame({'name':["alice","bob","charlene","alice","bob","charlene","alice","bob","charlene","edna" ],
                   'date':["2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-02","2020-01-01","2020-01-02","2020-01-01"],
                   'contribution': [5,5,10,20,30,1,5,5,10,100],
                   'payment-type': ["cash","transfer","cash","transfer","cash","transfer","cash","transfer","cash","transfer",]})
df['date'] = pd.to_datetime(df['date'])

def myfunction(input):
    output = input["name"].value_counts()
    output.index.set_names(['name_x'], inplace=True)
    return output

daily_count = df.groupby(pd.Grouper(key='date', freq='1D')).apply(myfunction)

print(daily_count.reset_index())

输出：

        date    name_x  name
0 2020-01-01       bob     3
1 2020-01-01  charlene     2
2 2020-01-01     alice     2
3 2020-01-01      edna     1
4 2020-01-02  charlene     1
5 2020-01-02     alice     1

我想将此代码的输出集成到前面的结果中

def myfunction(input):
    output = input["contribution"].sum()
    # output.index.set_names(['name_x'], inplace=True)
    return output
    
daily_count = df.groupby([pd.Grouper(key='date', freq='1D'), "name"]).apply(myfunction)

这会让我觉得：

        date      name   num_contrubutions  total_pp
0 2020-01-01       bob                   3        25
1 2020-01-01  charlene                   2        40
2 2020-01-01     alice                   2        11
3 2020-01-01      edna                   1       100
4 2020-01-02  charlene                   1         5
5 2020-01-02     alice                   1        10

使用apply（）对我来说很重要，因为我计划在函数中执行一些API调用和数据库查找

助教，安德鲁

df.groupby(["date","name"])["contribution"].agg(["count","sum"]).reset_index().sort_values(by="count",ascending=False)

#output

     date       name    count   sum
1   2020-01-01  bob        3    40
0   2020-01-01  alice      2    25
2   2020-01-01  charlene   2    11
3   2020-01-01  edna       1    100
4   2020-01-02  alice      1    5
5   2020-01-02  charlene   1    10

因此，首先，我们按日期和姓名分组，然后选择要应用聚合/计算的列，首先我们计算每个人的贡献。然后我们对它们进行求和。在这之后，为了保持正常的

数据帧的形状

，我们

重置索引

，并

对值进行排序

by=“count”

以降序的方式。

groupby agg

在一个groupby中计算多个单列聚合函数的情况下非常强大。语法非常灵活和简单，尽管不是最节省输入的

限制：聚合函数不能接受多个列作为输入。如果是这种情况，则必须回退到

.apply（）

演示结果

有关更多可能性的广泛讨论，请参见（例如）。

集成是指将它们连接在一起或使用一个应用程序直接获得这些结果，如果可能的话，我很想看到这两种方法，但我想到了使用一个应用程序。你能检查我的答案是否符合你的要求吗？恐怕这不起作用-正如我在问题中所说，我需要使用应用程序，因为我需要合并其他来源的数据。@Andrewowway嗯，我不确定我是否理解，但无论如何我会保留答案

def myfunc(sr):
    """Just a customized function for demo purpose"""
    # N.B. cannot write sr.sum() somehow
    return np.sum(sr) / (np.std(sr) + 1)

df_out = df.groupby([pd.Grouper(key='date', freq='D'), "name"]).agg({
    # column: [func1, func2, ...]
    "contribution": [np.size,  # accepts 1) a function
                     "sum",    # or 2) a built-in function name
                     myfunc    # or 3) an externally defined function
                     ],
    "payment-type": [
        lambda sr: len(np.unique(sr))  # or 4) a lambda function
    ]
})

# postprocess columns and indexes
df_out.columns = ["num_contrubutions", "total_pp", "myfunc", "type_count"]
df_out.reset_index(inplace=True)

                                                     # extra demo columns
        date      name  num_contrubutions  total_pp      myfunc  type_count
0 2020-01-01     alice                  2        25    2.941176           2
1 2020-01-01       bob                  3        40    3.128639           2
2 2020-01-01  charlene                  2        11    2.000000           2
3 2020-01-01      edna                  1       100  100.000000           1
4 2020-01-02     alice                  1         5    5.000000           1
5 2020-01-02  charlene                  1        10   10.000000           1