Python 返回按N个属性分组的最大值_Python_Arrays_Group By_Max_Namedtuple

Python 返回按N个属性分组的最大值

python arrays

Python 返回按N个属性分组的最大值,python,arrays,group-by,max,namedtuple,Python,Arrays,Group By,Max,Namedtuple,我来自Java背景，通过尽可能在工作环境中应用Python来学习Python。我有一个功能代码，我真的想改进基本上，我有一个名为tuple的列表，其中有3个数值和1个时间值 complete=[] uniquecomplete=set() screenedPartitions = namedtuple('screenedPartitions'['feedID','partition','date', 'screeeningMode']) 我解析一个日志，在填充之后，我想创建一个简化集，它本质

我来自Java背景，通过尽可能在工作环境中应用Python来学习Python。我有一个功能代码，我真的想改进

基本上，我有一个名为tuple的列表，其中有3个数值和1个时间值

complete=[]
uniquecomplete=set()
screenedPartitions = namedtuple('screenedPartitions'['feedID','partition','date', 'screeeningMode'])

我解析一个日志，在填充之后，我想创建一个简化集，它本质上是最新的成员，其中feedID、partition和screeningMode是相同的。到目前为止，我只能通过使用一个讨厌的嵌套循环来获得它

for a in complete:
    max = a             
    for b in complete:
        if a.feedID == b.feedID and a.partition == b.partition and\
                       a.screeeningMode == b.screeeningMode and a.date < b.date:
            max = b
    uniqueComplete.add(max)

所以代码运行后，第2行将被删除，因为第3行是最新版本

Tl；Dr，Python中的SQL是什么：

SELECT feedID,partition,screeeningMode,max(date)
from Complete
group by 'feedID','partition','screeeningMode'

试着这样做：

import pandas as pd

df = pd.DataFrame(screenedPartitions, columns=screenedPartitions._fields)
df = df.groupby(['feedID','partition','screeeningMode']).max()

这实际上取决于你的日期是如何表示的，但如果你提供数据，我想我们可以解决一些问题。

Chreers获得回复。日期是datetime对象datetime.datetime.Striptime（stringDate，“%d/%m/%Y%H:%m:%S”），并在上面添加了示例。我见过熊猫，但我想把它放在笼子里。

import pandas as pd

df = pd.DataFrame(screenedPartitions, columns=screenedPartitions._fields)
df = df.groupby(['feedID','partition','screeeningMode']).max()