Python 在一列中给定相同的值,是否连接其余行?
给定数据帧:Python 在一列中给定相同的值,是否连接其余行?,python,pandas,Python,Pandas,给定数据帧: name hobby since paul A 1995 john A 2005 paul B 2015 mary G 2013 chris E 2005 chris D 2001 paul C 1986 我想得到: name hobby1 since1 hobby2 since2 hobby3 since3 paul A
name hobby since
paul A 1995
john A 2005
paul B 2015
mary G 2013
chris E 2005
chris D 2001
paul C 1986
我想得到:
name hobby1 since1 hobby2 since2 hobby3 since3
paul A 1995 B 2015 C 1986
john A 2005 NaN NaN NaN NaN
mary G 2013 NaN NaN NaN NaN
chris E 2005 D 2001 NaN NaN
即,我希望每个名称有一行。一个人可以拥有的最大爱好数量,比如说3个,是我事先知道的。执行此操作的最优雅/最短方法是什么?您可以先添加到变量,然后使用:
也许是这样的?但使用此解决方案后,需要重命名列
df["combined"] = [ "{}_{}".format(x,y) for x,y in zip(df.hobby,df.since)]
df.groupby("name")["combined"]
.agg(lambda x: "_".join(x))
.str.split("_",expand=True)
结果是:
0 1 2 3 4 5
name
chris E 2005 D 2001 None None
john A 2005 None None None None
mary G 2013 None None None None
paul A 1995 B 2015 C 1986
使用cumcount
和unstack
。最后,使用multiindex.map
将两级列连接到一级
df1 = df.set_index(['name', df.groupby('name').cumcount().add(1)]) \
.unstack().sort_index(1,level=1)
df1.columns = df1.columns.map('{0[0]}{0[1]}'.format)
Out[812]:
hobby1 since1 hobby2 since2 hobby3 since3
name
chris E 2005.0 D 2001.0 NaN NaN
john A 2005.0 NaN NaN NaN NaN
mary G 2013.0 NaN NaN NaN NaN
paul A 1995.0 B 2015.0 C 1986.0
很好,谢谢!唯一的问题是:重要的是排序“hobby1-since1-hobby2-since2-hobby3-since3”而不是“hobby1-hobby2-hobby3-since1-since2-since3”。@lumpy您可以尝试将代码分配给m
,然后执行m[排序(m.columns,key=lambda x:x[-1])
这对于我的实际用例来说非常适合。非常感谢。
0 1 2 3 4 5
name
chris E 2005 D 2001 None None
john A 2005 None None None None
mary G 2013 None None None None
paul A 1995 B 2015 C 1986
df1 = df.set_index(['name', df.groupby('name').cumcount().add(1)]) \
.unstack().sort_index(1,level=1)
df1.columns = df1.columns.map('{0[0]}{0[1]}'.format)
Out[812]:
hobby1 since1 hobby2 since2 hobby3 since3
name
chris E 2005.0 D 2001.0 NaN NaN
john A 2005.0 NaN NaN NaN NaN
mary G 2013.0 NaN NaN NaN NaN
paul A 1995.0 B 2015.0 C 1986.0