Python 如何从一组日期列中获取最大日期值
以下是示例数据:Python 如何从一组日期列中获取最大日期值,python,python-3.x,pandas,numpy,datetime,Python,Python 3.x,Pandas,Numpy,Datetime,以下是示例数据: import pandas as pd d = {'name': ['john', 'tom', 'phill', 'nero', 'bob', 'rob'], 'date1' :['2015-10-05', '2015-01-05', '2015-07-06', '2015-10-06', '2015-10-06', '2015-12-08'], 'date2' :['2015-10-05', '2015-01-05', '2015-07-06', '2015-08-06',
import pandas as pd
d = {'name': ['john', 'tom', 'phill', 'nero', 'bob', 'rob'], 'date1' :['2015-10-05', '2015-01-05', '2015-07-06', '2015-10-06', '2015-10-06', '2015-12-08'], 'date2' :['2015-10-05', '2015-01-05', '2015-07-06', '2015-08-06', '2015-09-06', '2015-12-08'], 'date3' :['2015-07-05', '2015-11-05', '2015-07-06', '2015-11-06', '2015-05-06', '2015-05-08']}
df2 = pd.DataFrame(data = d)
df2['date1'] = pd.DatetimeIndex(df2['date1'])
df2['date2'] = pd.DatetimeIndex(df2['date2'])
df2['date3'] = pd.DatetimeIndex(df2['date3'])
这是桌子
问题1:我想创建一个新列max_date,它将为每一行提供最大日期值。我想我可以创建这些列的列表,然后在它们上应用max,但它不起作用。我找到了numpy.amax(),但无法使其工作
问题2:我必须使用列名来指定这些列,不能使用像df2[,0:2]这样的列的位置索引
更新关于问题2-当我说“使用列名”时,我的意思是我有一个列名列表,我需要像[date1,date2,
日期3]。对不起,如果我的帖子里没有说清楚的话
将
max
与过滤器一起使用
df2['max_date']=df2.filter(like='date',axis=1).max(1)
df2
Out[157]:
date1 date2 date3 name max_date
0 2015-10-05 2015-10-05 2015-07-05 john 2015-10-05
1 2015-01-05 2015-01-05 2015-11-05 tom 2015-11-05
2 2015-07-06 2015-07-06 2015-07-06 phill 2015-07-06
3 2015-10-06 2015-08-06 2015-11-06 nero 2015-11-06
4 2015-10-06 2015-09-06 2015-05-06 bob 2015-10-06
5 2015-12-08 2015-12-08 2015-05-08 rob 2015-12-08
您可以通过str.startswith
使用布尔索引:
date_cols = df2.columns[df2.columns.str.startswith('date')]
df2['max_date'] = df2[date_cols].max(1)
print(df2)
date1 date2 date3 name max_date
0 2015-10-05 2015-10-05 2015-07-05 john 2015-10-05
1 2015-01-05 2015-01-05 2015-11-05 tom 2015-11-05
2 2015-07-06 2015-07-06 2015-07-06 phill 2015-07-06
3 2015-10-06 2015-08-06 2015-11-06 nero 2015-11-06
4 2015-10-06 2015-09-06 2015-05-06 bob 2015-10-06
5 2015-12-08 2015-12-08 2015-05-08 rob 2015-12-08
选择数据类型
无论命名约定如何,这都适用于所有datetime列
df2.assign(max_date=df2.select_dtypes('datetime').max(1))
date1 date2 date3 name max_date
0 2015-10-05 2015-10-05 2015-07-05 john 2015-10-05
1 2015-01-05 2015-01-05 2015-11-05 tom 2015-11-05
2 2015-07-06 2015-07-06 2015-07-06 phill 2015-07-06
3 2015-10-06 2015-08-06 2015-11-06 nero 2015-11-06
4 2015-10-06 2015-09-06 2015-05-06 bob 2015-10-06
5 2015-12-08 2015-12-08 2015-05-08 rob 2015-12-08
您不需要axis=1
,尽管我在寻找一个可以使用列名列表的答案,但我可能可以使用.str方法。当然,你可以说date\u cols=['date1','date2','date3']
。