Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/347.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将具有相同ID的行的值替换为max date_Python_Pandas - Fatal编程技术网

Python 将具有相同ID的行的值替换为max date

Python 将具有相同ID的行的值替换为max date,python,pandas,Python,Pandas,下面是有关df简化版本的脚本: import pandas as pd df = pd.DataFrame({ 'id': ['1', '1','2','2','3','3','4','4','5','6','7'], 'product1_expiry_date' : ['-','-','2020-11-28','2020-11-13','-',

下面是有关df简化版本的脚本:

import pandas as pd
    
df = pd.DataFrame({ 
                   'id': ['1', '1','2','2','3','3','4','4','5','6','7'],
                   'product1_expiry_date' : ['-','-','2020-11-28','2020-11-13','-',
                                             '2020-11-13','2020-12-13','-','2020-11-16','-',
                                             '2020-11-28'],
                   'product2_expiry_date' : ['2020-11-16','2020-11-19','-',
                                             '-','2020-11-23','2020-11-13',
                                             '2020-12-13','-','2020-12-01','2020-12-01',
                                             '2020-12-14']
                 })
 df

id  product1_expiry_date    product2_expiry_date
1            -                   2020-11-16
1            -                   2020-11-19
2        2020-11-28                  -
2        2020-11-13                  -
3            -                   2020-11-23
3        2020-11-13              2020-11-13
4        2020-12-13              2020-12-13
4            -                         -
5        2020-11-16              2020-12-01
6            -                   2020-12-01
7        2020-11-28              2020-12-14
我希望没有重复的ID,对于每个ID,在适用的情况下删除较早的日期和“-”值。因为我只对以后的日期感兴趣

预期DF:

   id   product1_expiry_date    product2_expiry_date
    1            -                  2020-11-19
    2        2020-11-28                 -
    3        2020-11-13             2020-11-23
    4        2020-11-13             2020-11-13
    5        2020-12-13             2020-12-13
    6        2020-11-16             2020-12-01
    7        2020-11-28             2020-12-14

非常感谢您的帮助。

Id
转换为索引,然后将所有列转换为日期时间,并对每个索引使用
max

f = lambda x: pd.to_datetime(x, errors='coerce')
df1 = df.set_index('id').apply(f).max(level=0)
print (df1)
   product1_expiry_date product2_expiry_date
id                                          
1                   NaT           2020-11-19
2            2020-11-28                  NaT
3            2020-11-13           2020-11-23
4            2020-12-13           2020-12-13
5            2020-11-16           2020-12-01
6                   NaT           2020-12-01
7            2020-11-28           2020-12-14
如果希望将
NaT
替换为
-
是可能的,但会将日期时间与字符串混合,因此下一个处理应该是问题:

f = lambda x: pd.to_datetime(x, errors='coerce')
df1 = df.set_index('id').apply(f).max(level=0).fillna('-')
print (df1)
   product1_expiry_date product2_expiry_date
id                                          
1                     -  2020-11-19 00:00:00
2   2020-11-28 00:00:00                    -
3   2020-11-13 00:00:00  2020-11-23 00:00:00
4   2020-12-13 00:00:00  2020-12-13 00:00:00
5   2020-11-16 00:00:00  2020-12-01 00:00:00
6                     -  2020-12-01 00:00:00
7   2020-11-28 00:00:00  2020-12-14 00:00:00
如有必要,最后一个
id
列:

df1 = df1.reset_index()

Id
转换为索引,然后将所有列转换为日期时间,并对每个索引使用
max

f = lambda x: pd.to_datetime(x, errors='coerce')
df1 = df.set_index('id').apply(f).max(level=0)
print (df1)
   product1_expiry_date product2_expiry_date
id                                          
1                   NaT           2020-11-19
2            2020-11-28                  NaT
3            2020-11-13           2020-11-23
4            2020-12-13           2020-12-13
5            2020-11-16           2020-12-01
6                   NaT           2020-12-01
7            2020-11-28           2020-12-14
如果希望将
NaT
替换为
-
是可能的,但会将日期时间与字符串混合,因此下一个处理应该是问题:

f = lambda x: pd.to_datetime(x, errors='coerce')
df1 = df.set_index('id').apply(f).max(level=0).fillna('-')
print (df1)
   product1_expiry_date product2_expiry_date
id                                          
1                     -  2020-11-19 00:00:00
2   2020-11-28 00:00:00                    -
3   2020-11-13 00:00:00  2020-11-23 00:00:00
4   2020-12-13 00:00:00  2020-12-13 00:00:00
5   2020-11-16 00:00:00  2020-12-01 00:00:00
6                     -  2020-12-01 00:00:00
7   2020-11-28 00:00:00  2020-12-14 00:00:00
如有必要,最后一个
id
列:

df1 = df1.reset_index()