使用最外层分组级别上已执行的set_index（）对数据帧进行分组和筛选的最python方式？_Python_Pandas

使用最外层分组级别上已执行的set_index（）对数据帧进行分组和筛选的最python方式？

python pandas

使用最外层分组级别上已执行的set_index（）对数据帧进行分组和筛选的最python方式？,python,pandas,Python,Pandas,出于各种原因，我希望处理具有以下一般结构的Pandas数据帧：导入熊猫 exampledf=pandas.DataFrame([ {'PersonId'：'123'，'Interest'：'Basketball'，'SubmittedDate'：datetime.datetime.StrTime（'2018-04-18 13:00:08'，'%Y-%m-%d%H:%m:%S'），'Question'：'蛋糕还是死亡？'）， {'PersonId'：'123'，'Interest'：'Barba

出于各种原因，我希望处理具有以下一般结构的Pandas数据帧：

导入熊猫
exampledf=pandas.DataFrame([
{'PersonId'：'123'，'Interest'：'Basketball'，'SubmittedDate'：datetime.datetime.StrTime（'2018-04-18 13:00:08'，'%Y-%m-%d%H:%m:%S'），'Question'：'蛋糕还是死亡？'），
{'PersonId'：'123'，'Interest'：'Barball'，'SubmittedDate'：datetime.datetime.strTime（'1999-01-01 09:00:00'，'%Y-%m-%d%H:%m:%S'），'Question'：'Swallow speed？'），
{'PersonId'：'456'，'Interest'：'Sweering'，'SubmittedDate'：datetime.datetime.StrTime（'2011-02-27 23:00:00'，'%Y-%m-%d%H:%m:%S'），'Question'：'生命、宇宙、一切的答案？'，
{'PersonId'：'123'，'Interest'：'Basketball'，'SubmittedDate'：datetime.datetime.strTime（'2018-04-18 13:00:00'，'%Y-%m-%d%H:%m:%S'），'Question'：'N/A'}，
{'PersonId'：'789'，'Interest'：'Racquetball'，'SubmittedDate'：datetime.datetime.StrTime（'2018-05-02 12:00:00'，'%Y-%m-%d%H:%m:%S'），'Question'：'会有食物吗？'，
{'PersonId'：'789'，'Interest'：'Racquetball'，'SubmittedDate'：datetime.datetime.StrTime（'2002-05-28 02:00:00'，'%Y-%m-%d%H:%m:%S'），'Question'：'Swag？'，
{'PersonId'：'789'，'Interest'：'Racquetball'，'SubmittedDate'：datetime.datetime.StrTime（'2018-05-02 12:00:00'，'%Y-%m-%d%H:%m:%S'），'Question'：'很好，谢谢。}
])
示例df.set_索引（['PersonId'，'Interest']，inplace=True）
打印（示例DF）

因此看起来是这样的：

问题提交日期
人格利益
篮球蛋糕还是死亡？2018-04-18 13:00:08
棒球吞咽速度？1999-01-01 09:00:00
游泳是生命、宇宙、一切的答案？2011-02-27 23:00:00
123篮球不适用2018-04-18 13:00:00
789壁球有食物吗？2018-05-02 12:00:00
壁球吊球？2002-05-28 02:00:00
壁球很好，谢谢。2018-05-02 12:00:00

我希望将输出保持在与输入相同的结构中，但是减去没有最新SubmittedDate的任何行，任意断开连接（找到的第一行就可以了）

我已经找到了很多方法（各种额外的剥离和重新添加索引）。例如：

我可以在执行
```
.groupby（）
```
之前执行
```
exampledf.reset\u index（）
```
，然后在完成后再执行
```
.set\u index（）
```
，但这看起来很尴尬

但我正在努力做到优雅。例如：

我可以
```
.groupby（level=[0,1]）
```
，它添加了冗余的“PersonId”和“Interest”级别，这不会给“.max（）”带来任何问题，它可以回到一般的外观，使用
```
.reset_索引（level=[0,1]，drop=True）
```
，但是当我试图压缩“PersonId”上的
```
drop\u duplicates（）
```
时“兴趣”和“提交”在所有这些中的某个地方，我无法让它以不涉及更多分组和重置的方式工作

例如，这给了我一个

KeyError:'PersonId'

错误：

lastsubmittedperlookuptiesbrokendf=exampledf.groupby（level=[0,1]）.apply（lambda x:x[x['SubmittedDate']==x['SubmittedDate'].max（））.reset_索引（level=[0,1]，drop=True，inplace=False）。drop_重复项（子集=['PersonId'，'Interest'，'SubmittedDate']））

正如这一点：

lastsubmittedperlookuptiesbrokendf=exampledf.groupby（级别=[0,1]）。应用（lambda x:x[x['SubmittedDate']==x['SubmittedDate'].max（））。删除重复项（子集=['PersonId'，'Interest'，'SubmittedDate']）。重置索引（级别=[0,1]，删除=真，替换=假）

获得以下输出的最具python风格的方法是什么

问题提交日期
人格利益
棒球吞咽速度？1999-01-01 09:00:00
篮球蛋糕还是死亡？2018-04-18 13:00:08
456游泳对生命、宇宙、一切的回答？2011-02-27 23:00:00
789壁球会有食物吗？2018-05-02 12:00:00

（请注意，我当前笨重的实现对兴趣进行了重新排序，但我不关心它们的排序顺序。）

由于排序很快，足够快，因此不必太担心只需执行

max

上的额外工作，一种方法就是对提交的数据进行排序，然后在groupby之后执行最后一个：

In [11]: exampledf.sort_values("SubmittedDate").groupby(level=[0,1]).last()
Out[11]: 
                                                   Question       SubmittedDate
PersonId Interest                                                              
123      Baseball                            Swallow speed? 1999-01-01 09:00:00
         Basketball                          Cake or death? 2018-04-18 13:00:08
456      Swimming     Answer to life, universe, everything? 2011-02-27 23:00:00
789      Racquetball                          Good, thanks. 2018-05-02 12:00:00

由于排序很快，足够快，因此不必太担心只做一个

max

上面的额外工作，一种方法就是对提交的数据进行排序，然后在groupby之后取最后一个：

In [11]: exampledf.sort_values("SubmittedDate").groupby(level=[0,1]).last()
Out[11]: 
                                                   Question       SubmittedDate
PersonId Interest                                                              
123      Baseball                            Swallow speed? 1999-01-01 09:00:00
         Basketball                          Cake or death? 2018-04-18 13:00:08
456      Swimming     Answer to life, universe, everything? 2011-02-27 23:00:00
789      Racquetball                          Good, thanks. 2018-05-02 12:00:00