Python 按操作分组，条件为agg函数_Python_Pandas

Python 按操作分组，条件为agg函数

python pandas

Python 按操作分组，条件为agg函数,python,pandas,Python,Pandas,我有一个这样的数据框 node touch1 touch2 touch3 touch4 touch5 A Best Mid Mid A Best Worst Worst 我希望有一个基于条件树的groupby节点，这样作为回报，我将有一个groupby节点 node touch1 touch2 touch3 touch4 touch5 A Best Best Mid Worst 或者基本上如果有最好的节目，如

我有一个这样的数据框

node touch1 touch2 touch3 touch4 touch5
A    Best   Mid    Mid     
A           Best   Worst  Worst

我希望有一个基于条件树的groupby节点，这样作为回报，我将有一个groupby节点

node touch1 touch2 touch3 touch4 touch5
A    Best   Best   Mid    Worst

或者基本上如果有最好的节目，如果没有，但是有中期节目，如果没有，但是有最差的节目

我正在尝试类似的东西

group_cols = ["touch1", "touch2", "touch3", "touch4", "touch5"]
output.groupby(group_cols).agg({'Best':lambda val: (val == "Best").any(),'Mid':lambda val: (val == "Mid").any(), 'Worst':lambda val: (val == "Worst").any()}).reset_index()

但我不能让它工作。我想我错过了什么。你知道怎么做吗？

正如J_H在评论中所说，文本标签本身通常很难使用。我建议将它们转换为first，然后选择聚合中排名最高的一个

为此，首先按照从最小到最大的顺序构建类别：

categories=[“最差”、“中等”、“最佳”]

然后，将不是

节点的所有列转换为该分类类型：
df=df.set_索引（“节点”）
df=df.apply（lambda x:pd.Categorical（x，categories=categories，ordered=True））

现在，如果按groupby节点
，则聚合可以只取每列中的最大值：
df.groupby（“节点”）.max（）.reset_index（）

这将产生预期的结果：
node touch1 touch2 touch3 touch4 touch5
A    Best   Best   Mid    Worst  NaN

注意：如果在此之后不想将数据保持为分类数据，则需要使用df=df.astype（str）
将其转换回

数据
df = pd.DataFrame({
    "node": ["A", "A"],
    "touch1": ["Best", None],
    "touch2": ["Mid", "Best"],
    "touch3": ["Mid", "Worst"],
    "touch4": [None, "Worst"],
    "touch5": [None, None],
})

使用pandas 1.1.0的值排序
中的键
选项+
d = {'Best': 0, 'Mid': 1, 'Worst': 2, '': 3}
df_final = df.groupby('node').agg(lambda x: x.sort_values(key=lambda x: x.map(d))
                                             .head(1))

Out[600]:
     touch1 touch2 touch3 touch4 touch5
node
A      Best   Best    Mid  Worst

使用建议的映射字典是最好的方法
将熊猫作为pd导入
映射_dict={'Best'：0，'Mid'：1，'Best'：2，None:3}
df=pd.DataFrame({
“节点”：[“A”，“A”]，
“触摸1”：[“最佳”，无]，
“触摸2”：[“中”、“最佳”]，
“触摸3”：[“中”、“最差”]，
“触摸4”：[无，“最差”]，
“触摸5”：[无，无]，
})
result=df.groupby（'node'）.agg（lambda x:{value:key for key，value in mapping_dict.items（）}[min（x.map（mapping_dict）））
打印（结果）

给出：
     touch1 touch2 touch3 touch4 touch5
node                                   
A      Best   Best    Mid  Worst   None

请注意，{value:key for key，value in mapping_dict.items（）}
只是mapping_dict的反转（key:value变成value:key），用于检索原始编码。
文本标签{Best，Mid，Best}足够好了。但是如果你把它们转换成整数，你会更快乐。然后你可以使用非常感谢大家，两种解决方案都非常好，但是我还没有勇气升级熊猫