Python 保留数据帧'；使用groupby apply生成序列时的s索引_Python_Pandas_Group By

Python 保留数据帧'；使用groupby apply生成序列时的s索引

python pandas

Python 保留数据帧'；使用groupby apply生成序列时的s索引,python,pandas,group-by,Python,Pandas,Group By,当使用groupby+apply调用函数时，我想从DataFrame转到Seriesgroupby对象，将函数应用到以Series作为输入并返回Series作为输出的每个组，然后将groupby+apply调用的输出作为DataFrame中的字段分配。默认行为是将来自groupby+apply的输出按分组字段索引，这会阻止我将其干净地分配回DataFrame。我希望使用apply调用的函数将Series作为输入，并返回Series作为输出；我认为它比DataFrame到DataFrame要干净

当使用

groupby

apply

调用函数时，我想从

DataFrame

转到

Series

groupby对象，将函数应用到以

Series

作为输入并返回

Series

作为输出的每个组，然后将

groupby

apply

调用的输出作为

DataFrame

中的字段分配。默认行为是将来自

groupby

apply

的输出按分组字段索引，这会阻止我将其干净地分配回

DataFrame

。我希望使用apply调用的函数将

Series

作为输入，并返回

Series

作为输出；我认为它比

DataFrame

到

DataFrame

要干净一点。（对于本例来说，这不是获得结果的最佳方法；实际应用程序与此完全不同。）

类似于

output.index=df.index

的东西看起来太难看了，使用

group\u键

参数似乎不起作用：

output = df.groupby(['A', 'B'], group_keys = False)['C'].apply(less_than_two)
df['Less_Than_Two'] = output

假设您的

groupby

是必需的（并且生成的groupby对象的行数将比您的数据帧少——示例数据并非如此），那么将序列分配给“is.Even”列将导致NaN值（因为

输出的索引将比df
的索引短）
相反，根据示例数据，最简单的方法是将输出作为数据帧与df
合并，如下所示：
output = df.groupby(['A','B'])['C'].agg({'C':is_even}).reset_index() # reset_index restores 'A' and 'B' from indices to columns
output.columns = ['A','B','Is_Even'] #rename target column prior to merging
df.merge(output, how='left', on=['A','B']) # this will support a many-to-one relationship between combinations of 'A' & 'B' and 'Is_Even'
# and will thus properly map aggregated values to unaggregated values

另外，我应该注意，在变量名中使用下划线比使用点更好；例如，与R不同，点充当访问对象属性的运算符，因此在变量名中使用它们可能会阻止功能/造成混乱。
返回原始索引的结果，就像您要求的那样。它将在一个组的所有元素中广播相同的结果警告，请注意，dtype
可能被推断为其他类型。你可能得自己投
在本例中，为了添加另一列，我将使用
谢谢她。我的榜样并不伟大；刚刚更新。使用dataframe.groupby[字段名]的意图。对于多对多应用程序，应用具有相同索引的系列对系列，其中结果将具有与输入系列相同的形状。没有问题。现在还不清楚你为什么需要一个groupby
；根据您的描述和示例代码，您可以使用df.loc[：，'Less\u Than\u Two']=df.C.apply（Less\u Than\u Two）
创建Less\u Two
列。原则上，我喜欢使用最简单的数据结构，所以我想做Series
到Series
，但是索引的处理有点让我头疼。如果这个例子出现在现实世界中，我可能只会做df.C<2
，但我正在处理的问题有点不同。看起来transform
保持与输入字段相同的dtype
。我喜欢transform
保留原始索引，尽管我不一定要在这里广播，我想这并不重要，因为结果的长度为1。虽然在整个问题中，transform
将boolean
转换为datetime
，但无法将其转换回boolean
，但这个答案在精神上是最好的。来自R，我觉得索引是一把双刃剑，dtype有点难，但我喜欢其他很多东西。
output = df.groupby(['A', 'B'], group_keys = False)['C'].apply(less_than_two)
df['Less_Than_Two'] = output

output = df.groupby(['A','B'])['C'].agg({'C':is_even}).reset_index() # reset_index restores 'A' and 'B' from indices to columns
output.columns = ['A','B','Is_Even'] #rename target column prior to merging
df.merge(output, how='left', on=['A','B']) # this will support a many-to-one relationship between combinations of 'A' & 'B' and 'Is_Even'
# and will thus properly map aggregated values to unaggregated values

df.assign(
    Less_Than_Two=df.groupby(['A', 'B'])['C'].transform(less_than_two).astype(bool))

     A  B  C Less_Than_Two
0  999  1  1          True
1  999  2  3         False
2  111  3  1          True
3  111  4  3         False