Python 对数据帧应用多个函数
我正在寻找一种方法,从我的原始数据中集成多个apply函数。下面是一些简化的代码Python 对数据帧应用多个函数,python,pandas,Python,Pandas,我正在寻找一种方法,从我的原始数据中集成多个apply函数。下面是一些简化的代码 import pandas as pd df = pd.DataFrame({'name':["alice","bob","charlene","alice","bob","charlene","alice","bob","charlene&q
import pandas as pd
df = pd.DataFrame({'name':["alice","bob","charlene","alice","bob","charlene","alice","bob","charlene","edna" ],
'date':["2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-01","2020-01-02","2020-01-01","2020-01-02","2020-01-01"],
'contribution': [5,5,10,20,30,1,5,5,10,100],
'payment-type': ["cash","transfer","cash","transfer","cash","transfer","cash","transfer","cash","transfer",]})
df['date'] = pd.to_datetime(df['date'])
def myfunction(input):
output = input["name"].value_counts()
output.index.set_names(['name_x'], inplace=True)
return output
daily_count = df.groupby(pd.Grouper(key='date', freq='1D')).apply(myfunction)
print(daily_count.reset_index())
输出:
date name_x name
0 2020-01-01 bob 3
1 2020-01-01 charlene 2
2 2020-01-01 alice 2
3 2020-01-01 edna 1
4 2020-01-02 charlene 1
5 2020-01-02 alice 1
我想将此代码的输出集成到前面的结果中
def myfunction(input):
output = input["contribution"].sum()
# output.index.set_names(['name_x'], inplace=True)
return output
daily_count = df.groupby([pd.Grouper(key='date', freq='1D'), "name"]).apply(myfunction)
这会让我觉得:
date name num_contrubutions total_pp
0 2020-01-01 bob 3 25
1 2020-01-01 charlene 2 40
2 2020-01-01 alice 2 11
3 2020-01-01 edna 1 100
4 2020-01-02 charlene 1 5
5 2020-01-02 alice 1 10
使用apply()对我来说很重要,因为我计划在函数中执行一些API调用和数据库查找
助教,安德鲁
df.groupby(["date","name"])["contribution"].agg(["count","sum"]).reset_index().sort_values(by="count",ascending=False)
#output
date name count sum
1 2020-01-01 bob 3 40
0 2020-01-01 alice 2 25
2 2020-01-01 charlene 2 11
3 2020-01-01 edna 1 100
4 2020-01-02 alice 1 5
5 2020-01-02 charlene 1 10
因此,首先,我们按日期和姓名分组,然后选择要应用聚合/计算的列,首先我们计算每个人的贡献。然后我们对它们进行求和。在这之后,为了保持正常的
数据帧的形状
,我们重置索引
,并对值进行排序
by=“count”
以降序的方式。groupby agg
在一个groupby中计算多个单列聚合函数的情况下非常强大。语法非常灵活和简单,尽管不是最节省输入的
限制:聚合函数不能接受多个列作为输入。如果是这种情况,则必须回退到.apply()
演示
结果
有关更多可能性的广泛讨论,请参见(例如)。集成是指将它们连接在一起或使用一个应用程序直接获得这些结果,如果可能的话,我很想看到这两种方法,但我想到了使用一个应用程序。你能检查我的答案是否符合你的要求吗?恐怕这不起作用-正如我在问题中所说,我需要使用应用程序,因为我需要合并其他来源的数据。@Andrewowway嗯,我不确定我是否理解,但无论如何我会保留答案
def myfunc(sr):
"""Just a customized function for demo purpose"""
# N.B. cannot write sr.sum() somehow
return np.sum(sr) / (np.std(sr) + 1)
df_out = df.groupby([pd.Grouper(key='date', freq='D'), "name"]).agg({
# column: [func1, func2, ...]
"contribution": [np.size, # accepts 1) a function
"sum", # or 2) a built-in function name
myfunc # or 3) an externally defined function
],
"payment-type": [
lambda sr: len(np.unique(sr)) # or 4) a lambda function
]
})
# postprocess columns and indexes
df_out.columns = ["num_contrubutions", "total_pp", "myfunc", "type_count"]
df_out.reset_index(inplace=True)
# extra demo columns
date name num_contrubutions total_pp myfunc type_count
0 2020-01-01 alice 2 25 2.941176 2
1 2020-01-01 bob 3 40 3.128639 2
2 2020-01-01 charlene 2 11 2.000000 2
3 2020-01-01 edna 1 100 100.000000 1
4 2020-01-02 alice 1 5 5.000000 1
5 2020-01-02 charlene 1 10 10.000000 1