Python 根据动态条件选择行_Python_Pandas_Filtering

Python 根据动态条件选择行

python pandas

Python 根据动态条件选择行,python,pandas,filtering,Python,Pandas,Filtering,目前我正在研究这类数据集： date income account flag day month year 0 2018-04-13 470.57 1000 0002 8 13 4 2018 1 2018-04-14 375.54 1000 0002 8 14 4 2018 2 2018-05-15 375.54 1000 0002 8 15 5 2018

目前我正在研究这类数据集：

         date   income    account  flag  day  month  year
0  2018-04-13   470.57  1000 0002     8   13      4  2018  
1  2018-04-14   375.54  1000 0002     8   14      4  2018  
2  2018-05-15   375.54  1000 0002     8   15      5  2018  
3  2018-05-16   229.04  1000 0002     7   16      5  2018  
4  2018-06-17   216.62  1000 0002     7   17      6  2018  
5  2018-06-18   161.61  1000 0002     6   18      6  2018  
6  2018-04-19   131.87  0000 0001     6   19      4  2018  
7  2018-04-20   100.57  0000 0001     6   20      4  2018  
8  2018-08-21   100.57  0000 0001     6   21      8  2018  
9  2018-08-22    50.57  0000 0001     5   22      8  2018

我正在研究一个决策树回归模型，比较随机森林和外部树，并调整其一些超参数。我目前试图做的是分割数据集，以便将列

month

的最大值为每个唯一值

account

（如果方便的话，也可以设置为index）的行保留为test\u set，其他行保留为train\u set。基本上，这意味着将使用所有可用的历史数据进行回归，但属于上一个可用月份的数据除外，该数据将用于验证mse

我知道如何根据静态条件过滤数据帧，例如

df[df['month']<12]

，但在这种情况下，我需要为每个不同的

帐户

值保留属于max month的所有行

从前面的数据集中，我应该可以得到如下信息：

df\u test=

         date   income    account  flag  day  month  year 
4  2018-06-17   216.62  1000 0002     7   17      6  2018  
5  2018-06-18   161.61  1000 0002     6   18      6  2018   
8  2018-08-21   100.57  0000 0001     6   21      8  2018  
9  2018-08-22    50.57  0000 0001     5   22      8  2018

和

df\u列=

         date   income    account  flag  day  month  year
0  2018-04-13   470.57  1000 0002     8   13      4  2018  
1  2018-04-14   375.54  1000 0002     8   14      4  2018  
2  2018-05-15   375.54  1000 0002     8   15      5  2018  
3  2018-05-16   229.04  1000 0002     7   16      5  2018  
6  2018-04-19   131.87  0000 0001     6   19      4  2018  
7  2018-04-20   100.57  0000 0001     6   20      4  2018

例如，对于

df['account']=10000002

我可以使用第4个月和第5个月进行预测，第6个月进行验证。谢谢

您可以使用

转换
test=df[df.month==df.groupby('account').month.transform('max')].copy()
train=df.drop(test.index)
test
Out[643]: 
         date  income   account  flag  day  month  year
4  2018-06-17  216.62  10000002     7   17      6  2018
5  2018-06-18  161.61  10000002     6   18      6  2018
8  2018-08-21  100.57         1     6   21      8  2018
9  2018-08-22   50.57         1     5   22      8  2018

这似乎是解决办法！我会在一分钟内尝试它，如果它有效的话，我会向上投票：）谢谢！