Python 从数据框列比较返回多条记录
我有一个数据帧(Python 从数据框列比较返回多条记录,python,pandas,Python,Pandas,我有一个数据帧(df),看起来像: Column_A Column_B Column_C 0 01/11/2010 01/07/2016 10/07/2001 1 22/04/2014 02/04/2015 04/02/2015 2 08/01/2007 01/06/2015 06/01/2015 3 15/11/2017 04/01/2016 20/01/2014 4 09/10/2000 01/09/2015 09/01/2015
df
),看起来像:
Column_A Column_B Column_C
0 01/11/2010 01/07/2016 10/07/2001
1 22/04/2014 02/04/2015 04/02/2015
2 08/01/2007 01/06/2015 06/01/2015
3 15/11/2017 04/01/2016 20/01/2014
4 09/10/2000 01/09/2015 09/01/2015
5 04/09/2006 25/03/2016 25/03/2016
6 21/09/2015 01/07/2016 21/09/2015
7 18/02/2003 12/02/2016 02/12/2016
8 15/07/2014 14/12/2015 16/07/2007
9 05/05/2014 01/10/2015 05/06/2014
10 26/11/2013 26/11/2013 26/11/2013
11 03/09/2009 26/03/2015 26/03/2015
12 12/05/2015 12/05/2015 05/12/2015
13 27/10/2018 02/04/2014 04/02/2014
14 15/02/2016 15/02/2016 15/02/2016
我试图返回列A
列B
和列A
列C
的记录。对于信息,最终有可能进行更多的日期字段比较
因此,在本例中,我将返回:
Column_A Column_B Column_C
0 01/11/2010 01/07/2016 10/07/2001
1 15/11/2017 04/01/2016 20/01/2014
2 15/07/2014 14/12/2015 16/07/2007
3 27/10/2018 02/04/2014 04/02/2014
为了返回此输出,我已尝试:
IncorrectOrder = df[df['Column_A']>df['Column_B] or df['Column_A']>df['Column_C']]
但是,我只返回df['Column\u A']>df['Column\u B]
的记录
哪里可能出错?添加()
并将或更改为按位或-
:
df = df.apply(pd.to_datetime, dayfirst=True)
IncorrectOrder = df[(df['Column_A']>df['Column_B']) | ( df['Column_A']>df['Column_C'])]
print (IncorrectOrder)
Column_A Column_B Column_C
0 2010-11-01 2016-07-01 2001-07-10
3 2017-11-15 2016-01-04 2014-01-20
8 2014-07-15 2015-12-14 2007-07-16
13 2018-10-27 2014-04-02 2014-02-04
如有可能,多列:
IncorrectOrder = df[(df[['Column_B', 'Column_C']].lt(df['Column_A'], axis=0)).any(axis=1)]
#all columns comapred with first
#IncorrectOrder = df[(df.iloc[:, 1:].lt(df['Column_A'], axis=0)).any(axis=1)]
print (IncorrectOrder)
Column_A Column_B Column_C
0 2010-11-01 2016-07-01 2001-07-10
3 2017-11-15 2016-01-04 2014-01-20
8 2014-07-15 2015-12-14 2007-07-16
13 2018-10-27 2014-04-02 2014-02-04
详细信息:首先比较
print (df[['Column_B', 'Column_C']].lt(df['Column_A'], axis=0))
Column_B Column_C
0 False True
1 False False
2 False False
3 True True
4 False False
5 False False
6 False False
7 False False
8 False True
9 False False
10 False False
11 False False
12 False False
13 True True
14 False False
print ((df[['Column_B', 'Column_C']].lt(df['Column_A'], axis=0)).any(axis=1))
0 True
1 False
2 False
3 True
4 False
5 False
6 False
7 False
8 True
9 False
10 False
11 False
12 False
13 True
14 False
dtype: bool