Python 向原始数据帧添加虚拟列
我有一个如下所示的数据帧: 加入共同性别执行官全名GVKEY YEAR CONAME BECAMECEO重新加入LEFTOFC LEFTCO RELEFT原因页面 CO_PER_ROL 5622 NaN男性Ira A.Eichner 1004 1992 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A.Eichner 1004 1993 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A.Eichner 1004 1994 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A A.Eichner 1004 1995 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A.Eichner 1004 1996 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A.Eichner 1004 1997 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A A.Eichner 1004 1998 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5623男David P.Storch 1004 1992 AAR公司19961009南57 5623男David P.Storch 1004 1993 AAR公司19961009南57 5623男David P.Storch 1004 1994 AAR公司19961009南57 5623男David P.Storch 1004 1995 AAR公司19961009南57 5623男David P.Storch 1004 1996 AAR公司19961009南57 对于年份值,我喜欢将年份列(19931994…,2009)添加到原始数据框中,如果年份中的值为1992,则1992列中的值应为1,否则为0 我使用了一个非常愚蠢的for循环,但它似乎永远运行,因为我有一个大的数据集。 谁能帮我一下,谢谢Python 向原始数据帧添加虚拟列,python,pandas,dataframe,one-hot-encoding,Python,Pandas,Dataframe,One Hot Encoding,我有一个如下所示的数据帧: 加入共同性别执行官全名GVKEY YEAR CONAME BECAMECEO重新加入LEFTOFC LEFTCO RELEFT原因页面 CO_PER_ROL 5622 NaN男性Ira A.Eichner 1004 1992 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5622 NaN男性Ira A.Eichner 1004 1993 AAR公司19550101 NaN 19961001 19990531 NaN辞职79 5
In [77]: df = pd.concat([df, pd.get_dummies(df['YEAR'])], axis=1); df
Out[77]:
JOINED_CO GENDER EXEC_FULLNAME GVKEY YEAR CONAME BECAMECEO \
5622 NaN MALE Ira A. Eichner 1004 1992 AAR CORP 19550101
5622 NaN MALE Ira A. Eichner 1004 1993 AAR CORP 19550101
5622 NaN MALE Ira A. Eichner 1004 1994 AAR CORP 19550101
5622 NaN MALE Ira A. Eichner 1004 1995 AAR CORP 19550101
5622 NaN MALE Ira A. Eichner 1004 1996 AAR CORP 19550101
5622 NaN MALE Ira A. Eichner 1004 1997 AAR CORP 19550101
5622 NaN MALE Ira A. Eichner 1004 1998 AAR CORP 19550101
5623 NaN MALE David P. Storch 1004 1992 AAR CORP 19961009
5623 NaN MALE David P. Storch 1004 1993 AAR CORP 19961009
5623 NaN MALE David P. Storch 1004 1994 AAR CORP 19961009
5623 NaN MALE David P. Storch 1004 1995 AAR CORP 19961009
5623 NaN MALE David P. Storch 1004 1996 AAR CORP 19961009
REJOIN LEFTOFC LEFTCO RELEFT REASON PAGE 1992 1993 1994 \
5622 NaN 19961001 19990531 NaN RESIGNED 79 1 0 0
5622 NaN 19961001 19990531 NaN RESIGNED 79 0 1 0
5622 NaN 19961001 19990531 NaN RESIGNED 79 0 0 1
5622 NaN 19961001 19990531 NaN RESIGNED 79 0 0 0
5622 NaN 19961001 19990531 NaN RESIGNED 79 0 0 0
5622 NaN 19961001 19990531 NaN RESIGNED 79 0 0 0
5622 NaN 19961001 19990531 NaN RESIGNED 79 0 0 0
5623 NaN NaN NaN NaN NaN 57 1 0 0
5623 NaN NaN NaN NaN NaN 57 0 1 0
5623 NaN NaN NaN NaN NaN 57 0 0 1
5623 NaN NaN NaN NaN NaN 57 0 0 0
5623 NaN NaN NaN NaN NaN 57 0 0 0
1995 1996 1997 1998
5622 0 0 0 0
5622 0 0 0 0
5622 0 0 0 0
5622 1 0 0 0
5622 0 1 0 0
5622 0 0 1 0
5622 0 0 0 1
5623 0 0 0 0
5623 0 0 0 0
5623 0 0 0 0
5623 1 0 0 0
5623 0 1 0 0
如果您想删除YEAR
列,则可以使用del df['YEAR']
进行后续操作。或者,在调用concat
之前,从df
中删除YEAR
列:
df = pd.concat([df.drop('YEAR', axis=1), pd.get_dummies(df['YEAR'])], axis=1)
[77]中的
是什么意思?@guo:那是交互式shell提示符。它对输入进行编号。为什么我要用这个代码块将原始帧加倍?猜猜看@联合国大学
df = pd.concat([df.drop('YEAR', axis=1), pd.get_dummies(df['YEAR'])], axis=1)