Python 如何在分组的数据帧上强制pandas.DataFrame.apply_Python_Pandas

Python 如何在分组的数据帧上强制pandas.DataFrame.apply

python pandas

Python 如何在分组的数据帧上强制pandas.DataFrame.apply,python,pandas,Python,Pandas,pandas.DataFrame.apply（myfunc）的行为是沿着列应用myfunc。 pandas.core.groupby.DataFrameGroupBy.apply的行为更加复杂。此差异在函数myfunc中显示，因此frame.apply（myfunc）！=myfunc（框架）我想将一个DataFrame分组，然后沿着每个单独帧（每组中）的列应用myfunc，然后将结果粘贴在一起。有很多方法可以做到这一点，但我想知道，似乎有一些简单的夸格我错过了考虑下面的例子： In [22]

pandas.DataFrame.apply（myfunc）

的行为是沿着列应用

myfunc

。 pandas.core.groupby.DataFrameGroupBy.apply的行为更加复杂。此差异在函数

myfunc

中显示，因此

frame.apply（myfunc）！=myfunc（框架）

我想将一个

DataFrame

分组，然后沿着每个单独帧（每组中）的列应用

myfunc

，然后将结果粘贴在一起。有很多方法可以做到这一点，但我想知道，似乎有一些简单的夸格我错过了

考虑下面的例子：

In [22]: df = pd.DataFrame({'a':range(5), 'b': range(5, 10)})

In [23]: df
Out[23]: 
   a  b
0  0  5
1  1  6
2  2  7
3  3  8
4  4  9

In [24]: def myfunc(data):
             # Implements max in a funny way.
             # However, this is just an example of a function such that 
             # myfunc(frame) != frame.apply(myfunc)
             return data.values.ravel().max()

In [25]: df.apply(myfunc)
Out[25]: 
a    4
b    9

In [26]: df.groupby(df.a < 2).apply(myfunc)
Out[26]: 
a
False    9
True     6

你可以这样做

In [25]: df.groupby(df.a<2).aggregate(myfunc)
Out[25]: 
       a  b
a          
False  4  9
True   1  6

[2 rows x 2 columns]

[25]中的

：df.groupby（df.aun）很幸运这对我不起作用：df.groupby（df.a<2）.aggregate（myfunc）返回数据帧（{'a'：[9,6]，'b'：[9,6]}，index=[False，True]）。换句话说，它执行了myfunc（对于g==a组），而不是g.apply（myfunc）.aggregate的docstring指定它会尝试这两种方法，但不会指定它们的尝试顺序。还要注意的是，myfunc就是一个例子，因此尽管.max（）解决了这个问题，但我仍在搜索答案。我的版本是0.12，请将myfunc
上的返回值更改为：返回数据。max（）
。你不想ravel
这两列（你想让它们分开）。max只是一个愚蠢的例子。我知道max是按列应用的。我在寻找一个更一般的答案，不使用max的特殊实现。换句话说，我在寻找一种规范的方法来使用更一般的函数（myfunc）这给了一个整体或柱状的框架不同的结果。换句话说，我想考虑函数，比如MyFunc（框架）=框架。应用（MyFunc）。我想强制框架的行为。应用（MyFunc）在一个成组的框架中。然后只是迭代（并且可能是，你必须自己处理），例如“代码> CONTAT（MyFunc）。（grp）对于g，df.groupby（…）]中的grp）]

In [25]: df.groupby(df.a<2).aggregate(myfunc)
Out[25]: 
       a  b
a          
False  4  9
True   1  6

[2 rows x 2 columns]

In [26]: df.groupby(df.a<2).max()
Out[26]: 
       a  b
a          
False  4  9
True   1  6

[2 rows x 2 columns]