Pandas 如何在Python中为机器学习准备paneldata?

Pandas 如何在Python中为机器学习准备paneldata?,pandas,Pandas,我有一个面板数据集/时间序列。我想为明年的gcp机器学习预测准备数据集。我的数据如下所示: ID,year,age,area,debt_ratio,gcp 654001,2013,49,East,0.14,0 654001,2014,50,East,0.17,0 654001,2015,51,East,0.23,1 654001,2016,52,East,0.18,0 112089,2013,39,West,0.13,0 112089,2014,40,West,0.15,0 112089,201

我有一个面板数据集/时间序列。我想为明年的gcp机器学习预测准备数据集。我的数据如下所示:

ID,year,age,area,debt_ratio,gcp
654001,2013,49,East,0.14,0
654001,2014,50,East,0.17,0
654001,2015,51,East,0.23,1
654001,2016,52,East,0.18,0
112089,2013,39,West,0.13,0
112089,2014,40,West,0.15,0
112089,2015,41,West,0.18,1
112089,2016,42,West,0.21,1
ID,year,age,area,debt_ratio,gcp,gcp-1,gcp-2,gcp-3
654001,2013,49,East,0.14,0,NA,NA,NA
654001,2014,50,East,0.17,0,0,NA,NA
654001,2015,51,East,0.23,1,0,0,NA
654001,2016,52,East,0.18,0,1,0,0
112089,2013,39,West,0.13,0,NA,NA,NA
112089,2014,40,West,0.15,0,0,NA,NA
112089,2015,41,West,0.18,1,0,0,NA
112089,2016,42,West,0.21,1,1,0,0
我想要的是这样的东西:

ID,year,age,area,debt_ratio,gcp
654001,2013,49,East,0.14,0
654001,2014,50,East,0.17,0
654001,2015,51,East,0.23,1
654001,2016,52,East,0.18,0
112089,2013,39,West,0.13,0
112089,2014,40,West,0.15,0
112089,2015,41,West,0.18,1
112089,2016,42,West,0.21,1
ID,year,age,area,debt_ratio,gcp,gcp-1,gcp-2,gcp-3
654001,2013,49,East,0.14,0,NA,NA,NA
654001,2014,50,East,0.17,0,0,NA,NA
654001,2015,51,East,0.23,1,0,0,NA
654001,2016,52,East,0.18,0,1,0,0
112089,2013,39,West,0.13,0,NA,NA,NA
112089,2014,40,West,0.15,0,0,NA,NA
112089,2015,41,West,0.18,1,0,0,NA
112089,2016,42,West,0.21,1,1,0,0
我尝试了熊猫融化功能,但没有成功。我在网上搜索,发现这篇文章正是我想做的,但它是用R:

https://stackoverflow.com/questions/19813077/prepare-time-series-for-machine-learning-long-to-wide-format
有人知道如何在Python中实现这一点吗?任何建议都将不胜感激

在循环中使用:

更具动态性的解决方案是获取最大组数并传递到
范围

N = df['ID'].value_counts().max()

for i in range(1, N):
    df[f'gcp-{i}'] = df.groupby('ID')['gcp'].shift(i)
print (df)
       ID  year  age  area  debt_ratio  gcp  gcp-1  gcp-2  gcp-3
0  654001  2013   49  East        0.14    0    NaN    NaN    NaN
1  654001  2014   50  East        0.17    0    0.0    NaN    NaN
2  654001  2015   51  East        0.23    1    0.0    0.0    NaN
3  654001  2016   52  East        0.18    0    1.0    0.0    0.0
4  112089  2013   39  West        0.13    0    NaN    NaN    NaN
5  112089  2014   40  West        0.15    0    0.0    NaN    NaN
6  112089  2015   41  West        0.18    1    0.0    0.0    NaN
7  112089  2016   42  West        0.21    1    1.0    0.0    0.0

你好,非常感谢!它工作得很好!动态解决方案绝对是最好的,因为我的真实数据包含超过20年的数据!