Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/302.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 分组、移位和正向填充_Python_Pandas_Dataframe - Fatal编程技术网

Python 分组、移位和正向填充

Python 分组、移位和正向填充,python,pandas,dataframe,Python,Pandas,Dataframe,我有这个df: ID Date Time Lat Lon A 07/16/2019 08:00 29.39291 -98.50925 A 07/16/2019 09:00 29.39923 -98.51256 A 07/16/2019 10:00 29.40147 -98.51123 A 07/18/2019 08:30 29.38752 -98.52372 A 07/18/2019 09:30 29.

我有这个df:

ID         Date   Time       Lat       Lon
 A  07/16/2019   08:00  29.39291 -98.50925
 A  07/16/2019   09:00  29.39923 -98.51256
 A  07/16/2019   10:00  29.40147 -98.51123
 A  07/18/2019   08:30  29.38752 -98.52372
 A  07/18/2019   09:30  29.39291 -98.50925
 B  07/16/2019   08:00  29.39537 -98.50402
 B  07/18/2019   11:00  29.39343 -98.49707
 B  07/18/2019   12:00  29.39291 -98.50925
 B  07/19/2019   10:00  29.39556 -98.53148
我想按
ID
Date
对df进行分组,将行向后移动一步,并用正向填充填充NaN值

注意:
(ID,Date)
只有一行,应该由行本身填充

例如:
b07/16/2019 08:00 29.39537-98.50402

预期结果:

ID         Date   Time       Lat       Lon Time.1     Lat.1     Lon.1
 A  07/16/2019   08:00  29.39291 -98.50925  09:00  29.39923 -98.51256
 A  07/16/2019   09:00  29.39923 -98.51256  10:00  29.40147 -98.51123
 A  07/16/2019   10:00  29.40147 -98.51123  10:00  29.40147 -98.51123
 A  07/18/2019   08:30  29.38752 -98.52372  09:30  29.39291 -98.50925
 A  07/18/2019   09:30  29.39291 -98.50925  09:30  29.39291 -98.50925
 B  07/16/2019   08:00  29.39537 -98.50402  08:00  29.39537 -98.50402
 B  07/18/2019   11:00  29.39343 -98.49707  12:00  29.39291 -98.50925
 B  07/18/2019   12:00  29.39291 -98.50925  12:00  29.39291 -98.50925
 B  07/19/2019   10:00  29.39556 -98.53148  10:00  29.39556 -98.53148
pd.concat([df, df.groupby(['ID','Date']).shift(-1).ffill()], axis=1)
我正在使用的代码(未达到预期结果):

以下是一种方法:

def grp_col(f):
    f['Time.1'] = f['Time'].shift(-1).ffill().fillna(f['Time'].iloc[0])
    f['Lat.1'] = f['Lat'].shift(-1).ffill().fillna(f['Lat'].iloc[0])
    f['Lon.1'] = f['Lon'].shift(-1).ffill().fillna(f['Lon'].iloc[0])
    return f

df = df.groupby(['ID','Date'], as_index=False).apply(grp_col)

如果原始数据中没有缺失值,则解决方案-首先用原始值替换一个元素组的行,然后向前填充缺失值:

m = ~df.duplicated(['ID','Date']) & ~df.duplicated(['ID','Date'], keep=False)
df1 = df.groupby(['ID','Date']).shift(-1).mask(m, df).ffill()
df = pd.concat([df, df1.add_suffix('.1')], axis=1)
print (df)
  ID        Date   Time       Lat       Lon Time.1     Lat.1     Lon.1
0  A  07/16/2019  08:00  29.39291 -98.50925  09:00  29.39923 -98.51256
1  A  07/16/2019  09:00  29.39923 -98.51256  10:00  29.40147 -98.51123
2  A  07/16/2019  10:00  29.40147 -98.51123  10:00  29.40147 -98.51123
3  A  07/18/2019  08:30  29.38752 -98.52372  09:30  29.39291 -98.50925
4  A  07/18/2019  09:30  29.39291 -98.50925  09:30  29.39291 -98.50925
5  B  07/16/2019  08:00  29.39537 -98.50402  08:00  29.39537 -98.50402
6  B  07/18/2019  11:00  29.39343 -98.49707  12:00  29.39291 -98.50925
7  B  07/18/2019  12:00  29.39291 -98.50925  12:00  29.39291 -98.50925
8  B  07/19/2019  10:00  29.39556 -98.53148  10:00  29.39556 -98.53148
如果不需要自定义功能,则需要双
groupby
,因为每个组都需要向前填充:

df1 = df.groupby(['ID','Date']).shift(-1).groupby([df['ID'],df['Date']]).ffill().fillna(df)
df = pd.concat([df, df1.add_suffix('.1')], axis=1)
print (df)
  ID        Date   Time       Lat       Lon Time.1     Lat.1     Lon.1
0  A  07/16/2019  08:00  29.39291 -98.50925  09:00  29.39923 -98.51256
1  A  07/16/2019  09:00  29.39923 -98.51256  10:00  29.40147 -98.51123
2  A  07/16/2019  10:00  29.40147 -98.51123  10:00  29.40147 -98.51123
3  A  07/18/2019  08:30  29.38752 -98.52372  09:30  29.39291 -98.50925
4  A  07/18/2019  09:30  29.39291 -98.50925  09:30  29.39291 -98.50925
5  B  07/16/2019  08:00  29.39537 -98.50402  08:00  29.39537 -98.50402
6  B  07/18/2019  11:00  29.39343 -98.49707  12:00  29.39291 -98.50925
7  B  07/18/2019  12:00  29.39291 -98.50925  12:00  29.39291 -98.50925
8  B  07/19/2019  10:00  29.39556 -98.53148  10:00  29.39556 -98.53148
使用lambda函数时,应采用如下解决方案:

c = ['Time','Lat','Lon']
df1 = df.groupby(['ID','Date'])[c].apply(lambda x: x.shift(-1).ffill()).fillna(df)
df = pd.concat([df, df1.add_suffix('.1')], axis=1)
print (df)
  ID        Date   Time       Lat       Lon Time.1     Lat.1     Lon.1
0  A  07/16/2019  08:00  29.39291 -98.50925  09:00  29.39923 -98.51256
1  A  07/16/2019  09:00  29.39923 -98.51256  10:00  29.40147 -98.51123
2  A  07/16/2019  10:00  29.40147 -98.51123  10:00  29.40147 -98.51123
3  A  07/18/2019  08:30  29.38752 -98.52372  09:30  29.39291 -98.50925
4  A  07/18/2019  09:30  29.39291 -98.50925  09:30  29.39291 -98.50925
5  B  07/16/2019  08:00  29.39537 -98.50402  08:00  29.39537 -98.50402
6  B  07/18/2019  11:00  29.39343 -98.49707  12:00  29.39291 -98.50925
7  B  07/18/2019  12:00  29.39291 -98.50925  12:00  29.39291 -98.50925
8  B  07/19/2019  10:00  29.39556 -98.53148  10:00  29.39556 -98.53148

它工作得很好。有没有一种方法可以在没有定义函数的情况下实现?