Python堆叠为非堆叠格式
或者也称为长到宽格式 我有以下资料:Python堆叠为非堆叠格式,python,database,pandas,Python,Database,Pandas,或者也称为长到宽格式 我有以下资料: ID1 ID2 POS1 POS2 TYPE TYPEVAL --- --- ---- ---- ---- ------- A 001 1 5 COLOR RED A 001 1 5 WEIGHT 50KG A 001 1 5 HEIGHT 160CM A 002
ID1 ID2 POS1 POS2 TYPE TYPEVAL
--- --- ---- ---- ---- -------
A 001 1 5 COLOR RED
A 001 1 5 WEIGHT 50KG
A 001 1 5 HEIGHT 160CM
A 002 6 19 FUTURE YES
A 002 6 19 PRESENT NO
B 001 26 34 COLOUR BLUE
B 001 26 34 WEIGHT 85KG
B 001 26 34 HEIGHT 120CM
C 001 10 13 MOBILE NOKIA
C 001 10 13 TABLET ASUS
其中,我希望将类型
列转换为每个唯一值的新列,即
ID1 ID2 POS1 POS2 COLOR WEIGHT HEIGHT FUTURE PRESENT MOBILE TABLET
A 001 1 5 RED 50KG 160CM NA NA NA NA
A 002 6 19 NA NA NA YES NO NA NA
B 001 26 34 BLUE 85KG 120CM NA NA NA NA
C 001 10 13 NA NA NA NA NA NOKIA ASUS
我试着用以下方法来做:
PD.pivot_table(df,index=["ID1","ID2"],columns=["BEGIN","END","TYPE"],values=["TYPEVAL"])
然而,我得到:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/pandas/tools/pivot.py", line 127, in pivot_table
agged = grouped.agg(aggfunc)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 3690, in aggregate
return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 3179, in aggregate
result, how = self._aggregate(arg, _level=_level, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/base.py", line 432, in _aggregate
return getattr(self, arg)(*args, **kwargs), None
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1009, in mean
return self._cython_agg_general('mean')
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 3113, in _cython_agg_general
how, numeric_only=numeric_only)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 3159, in _cython_agg_blocks
raise DataError('No numeric types to aggregate')
回溯(最近一次呼叫最后一次):
文件“”,第1行,在
pivot_表中的文件“/usr/local/lib/python2.7/dist packages/pandas/tools/pivot.py”,第127行
agged=grouped.agg(aggfunc)
文件“/usr/local/lib/python2.7/dist packages/pandas/core/groupby.py”,第3690行,总计
返回super(DataFrameGroupBy,self).aggregate(arg,*args,**kwargs)
文件“/usr/local/lib/python2.7/dist packages/pandas/core/groupby.py”,第3179行,总计
结果,how=self.\u聚合(arg,\u-level=\u-level,*args,**kwargs)
文件“/usr/local/lib/python2.7/dist-packages/pandas/core/base.py”,第432行,汇总
返回getattr(self,arg)(*args,**kwargs),无
文件“/usr/local/lib/python2.7/dist packages/pandas/core/groupby.py”,第1009行,平均值
返回自我。_cython_agg_general('mean'))
文件“/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py”,第3113行,位于cython-agg-general中
如何,仅数值=仅数值)
文件“/usr/local/lib/python2.7/dist packages/pandas/core/groupby.py”,第3159行,位于cython\u agg\u块中
raise DATABERROR('没有要聚合的数字类型')
提示我通过某个数值函数(即平均值或总和)聚合列。但是我不想做这样的事情,我只想转置TYPE
列,而不进行任何聚合
任何建议都将不胜感激 我认为您需要先聚合或者如果有多个值连接或求和,因为deafult聚合函数是平均值
,它只适用于数字:
df1 = pd.pivot_table(df,
index=["ID1","ID2","POS1","POS2",],
columns="TYPE",
values="TYPEVAL",
aggfunc='first')
.reset_index().rename_axis(None, axis=1)
print (df1)
ID1 ID2 POS1 POS2 COLOR COLOUR FUTURE HEIGHT MOBILE PRESENT TABLET WEIGHT
0 A 1 1 5 RED None None 160CM None None None 50KG
1 A 2 6 19 None None YES None None NO None None
2 B 1 26 34 None BLUE None 120CM None None None 85KG
3 C 1 10 13 None None None None NOKIA None ASUS None
df1 = pd.pivot_table(df,
index=["ID1","ID2","POS1","POS2",],
columns="TYPE",
values="TYPEVAL",
aggfunc=','.join)
.reset_index().rename_axis(None, axis=1)
print (df1)
ID1 ID2 POS1 POS2 COLOR COLOUR FUTURE HEIGHT MOBILE PRESENT TABLET WEIGHT
0 A 1 1 5 RED None None 160CM None None None 50KG
1 A 2 6 19 None None YES None None NO None None
2 B 1 26 34 None BLUE None 120CM None None None 85KG
3 C 1 10 13 None None None None NOKIA None ASUS None
您可以使用除'TYPEVAL'
列之外的所有列设置索引,然后取消堆栈
df.set_index(
df.columns.difference(['TYPEVAL']).tolist()
).TYPEVAL.unstack('TYPE').reset_index().rename_axis(None, axis=1)