Pandas 如何在Python中为机器学习准备paneldata?
我有一个面板数据集/时间序列。我想为明年的gcp机器学习预测准备数据集。我的数据如下所示:Pandas 如何在Python中为机器学习准备paneldata?,pandas,Pandas,我有一个面板数据集/时间序列。我想为明年的gcp机器学习预测准备数据集。我的数据如下所示: ID,year,age,area,debt_ratio,gcp 654001,2013,49,East,0.14,0 654001,2014,50,East,0.17,0 654001,2015,51,East,0.23,1 654001,2016,52,East,0.18,0 112089,2013,39,West,0.13,0 112089,2014,40,West,0.15,0 112089,201
ID,year,age,area,debt_ratio,gcp
654001,2013,49,East,0.14,0
654001,2014,50,East,0.17,0
654001,2015,51,East,0.23,1
654001,2016,52,East,0.18,0
112089,2013,39,West,0.13,0
112089,2014,40,West,0.15,0
112089,2015,41,West,0.18,1
112089,2016,42,West,0.21,1
ID,year,age,area,debt_ratio,gcp,gcp-1,gcp-2,gcp-3
654001,2013,49,East,0.14,0,NA,NA,NA
654001,2014,50,East,0.17,0,0,NA,NA
654001,2015,51,East,0.23,1,0,0,NA
654001,2016,52,East,0.18,0,1,0,0
112089,2013,39,West,0.13,0,NA,NA,NA
112089,2014,40,West,0.15,0,0,NA,NA
112089,2015,41,West,0.18,1,0,0,NA
112089,2016,42,West,0.21,1,1,0,0
我想要的是这样的东西:
ID,year,age,area,debt_ratio,gcp
654001,2013,49,East,0.14,0
654001,2014,50,East,0.17,0
654001,2015,51,East,0.23,1
654001,2016,52,East,0.18,0
112089,2013,39,West,0.13,0
112089,2014,40,West,0.15,0
112089,2015,41,West,0.18,1
112089,2016,42,West,0.21,1
ID,year,age,area,debt_ratio,gcp,gcp-1,gcp-2,gcp-3
654001,2013,49,East,0.14,0,NA,NA,NA
654001,2014,50,East,0.17,0,0,NA,NA
654001,2015,51,East,0.23,1,0,0,NA
654001,2016,52,East,0.18,0,1,0,0
112089,2013,39,West,0.13,0,NA,NA,NA
112089,2014,40,West,0.15,0,0,NA,NA
112089,2015,41,West,0.18,1,0,0,NA
112089,2016,42,West,0.21,1,1,0,0
我尝试了熊猫融化功能,但没有成功。我在网上搜索,发现这篇文章正是我想做的,但它是用R:
https://stackoverflow.com/questions/19813077/prepare-time-series-for-machine-learning-long-to-wide-format
有人知道如何在Python中实现这一点吗?任何建议都将不胜感激 在循环中使用:
更具动态性的解决方案是获取最大组数并传递到范围
:
N = df['ID'].value_counts().max()
for i in range(1, N):
df[f'gcp-{i}'] = df.groupby('ID')['gcp'].shift(i)
print (df)
ID year age area debt_ratio gcp gcp-1 gcp-2 gcp-3
0 654001 2013 49 East 0.14 0 NaN NaN NaN
1 654001 2014 50 East 0.17 0 0.0 NaN NaN
2 654001 2015 51 East 0.23 1 0.0 0.0 NaN
3 654001 2016 52 East 0.18 0 1.0 0.0 0.0
4 112089 2013 39 West 0.13 0 NaN NaN NaN
5 112089 2014 40 West 0.15 0 0.0 NaN NaN
6 112089 2015 41 West 0.18 1 0.0 0.0 NaN
7 112089 2016 42 West 0.21 1 1.0 0.0 0.0
你好,非常感谢!它工作得很好!动态解决方案绝对是最好的,因为我的真实数据包含超过20年的数据!