Python 向原始数据帧添加虚拟列

Python 向原始数据帧添加虚拟列,python,pandas,dataframe,one-hot-encoding,Python,Pandas,Dataframe,One Hot Encoding,我有一个如下所示的数据帧: 加入共同性别执行官全名GVKEY YEAR CONAME BECAMECEO重新加入LEFTOFC LEFTCO RELEFT原因页面 CO_PER_ROL 5622 NaN男性Ira A.Eichner 1004 1992 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A.Eichner 1004 1993 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5

我有一个如下所示的数据帧:

加入共同性别执行官全名GVKEY YEAR CONAME BECAMECEO重新加入LEFTOFC LEFTCO RELEFT原因页面 CO_PER_ROL 5622 NaN男性Ira A.Eichner 1004 1992 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A.Eichner 1004 1993 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A.Eichner 1004 1994 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A A.Eichner 1004 1995 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A.Eichner 1004 1996 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A.Eichner 1004 1997 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A A.Eichner 1004 1998 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5623男David P.Storch 1004 1992 AAR公司19961009南57 5623男David P.Storch 1004 1993 AAR公司19961009南57 5623男David P.Storch 1004 1994 AAR公司19961009南57 5623男David P.Storch 1004 1995 AAR公司19961009南57 5623男David P.Storch 1004 1996 AAR公司19961009南57

对于年份值,我喜欢将年份列(19931994…,2009)添加到原始数据框中,如果年份中的值为1992,则1992列中的值应为1,否则为0

我使用了一个非常愚蠢的for循环,但它似乎永远运行,因为我有一个大的数据集。 谁能帮我一下,谢谢

In [77]: df = pd.concat([df, pd.get_dummies(df['YEAR'])], axis=1); df
Out[77]: 
      JOINED_CO GENDER    EXEC_FULLNAME  GVKEY  YEAR    CONAME  BECAMECEO  \
5622        NaN   MALE   Ira A. Eichner   1004  1992  AAR CORP   19550101   
5622        NaN   MALE   Ira A. Eichner   1004  1993  AAR CORP   19550101   
5622        NaN   MALE   Ira A. Eichner   1004  1994  AAR CORP   19550101   
5622        NaN   MALE   Ira A. Eichner   1004  1995  AAR CORP   19550101   
5622        NaN   MALE   Ira A. Eichner   1004  1996  AAR CORP   19550101   
5622        NaN   MALE   Ira A. Eichner   1004  1997  AAR CORP   19550101   
5622        NaN   MALE   Ira A. Eichner   1004  1998  AAR CORP   19550101   
5623        NaN   MALE  David P. Storch   1004  1992  AAR CORP   19961009   
5623        NaN   MALE  David P. Storch   1004  1993  AAR CORP   19961009   
5623        NaN   MALE  David P. Storch   1004  1994  AAR CORP   19961009   
5623        NaN   MALE  David P. Storch   1004  1995  AAR CORP   19961009   
5623        NaN   MALE  David P. Storch   1004  1996  AAR CORP   19961009   

      REJOIN   LEFTOFC    LEFTCO  RELEFT    REASON  PAGE  1992  1993  1994  \
5622     NaN  19961001  19990531     NaN  RESIGNED    79     1     0     0   
5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     1     0   
5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     0     1   
5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     0     0   
5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     0     0   
5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     0     0   
5622     NaN  19961001  19990531     NaN  RESIGNED    79     0     0     0   
5623     NaN       NaN       NaN     NaN       NaN    57     1     0     0   
5623     NaN       NaN       NaN     NaN       NaN    57     0     1     0   
5623     NaN       NaN       NaN     NaN       NaN    57     0     0     1   
5623     NaN       NaN       NaN     NaN       NaN    57     0     0     0   
5623     NaN       NaN       NaN     NaN       NaN    57     0     0     0   

      1995  1996  1997  1998  
5622     0     0     0     0  
5622     0     0     0     0  
5622     0     0     0     0  
5622     1     0     0     0  
5622     0     1     0     0  
5622     0     0     1     0  
5622     0     0     0     1  
5623     0     0     0     0  
5623     0     0     0     0  
5623     0     0     0     0  
5623     1     0     0     0  
5623     0     1     0     0  
如果您想删除
YEAR
列,则可以使用
del df['YEAR']
进行后续操作。或者,在调用
concat
之前,从
df
中删除
YEAR
列:

df = pd.concat([df.drop('YEAR', axis=1), pd.get_dummies(df['YEAR'])], axis=1)

[77]中的
是什么意思?@guo:那是交互式shell提示符。它对输入进行编号。为什么我要用这个代码块将原始帧加倍?猜猜看@联合国大学
df = pd.concat([df.drop('YEAR', axis=1), pd.get_dummies(df['YEAR'])], axis=1)