Python 如何将pandas dataframe转换为交叉制表dataframe并保留所有值
假设我们有这样的数据帧:Python 如何将pandas dataframe转换为交叉制表dataframe并保留所有值,python,pandas,dataframe,crosstab,Python,Pandas,Dataframe,Crosstab,假设我们有这样的数据帧: df = pd.DataFrame({'key' : ['one', 'two', 'three', 'four'] * 3, 'col' : ['A', 'B', 'C'] * 4, 'val1' : np.random.randn(12), 'val2' : np.random.randn(12), 'val3' :
df = pd.DataFrame({'key' : ['one', 'two', 'three', 'four'] * 3,
'col' : ['A', 'B', 'C'] * 4,
'val1' : np.random.randn(12),
'val2' : np.random.randn(12),
'val3' : np.random.randn(12)})
键+列
是唯一键
我想将col
值拆分为列,或在列上交叉制表,最后看起来像这样:
第一种幼稚的方法pd.crosstab(df.key,df.col)
在这里不起作用:
此代码pd.crosstab(df.key,df.col,values=df[['val1','val2','val3']],aggfunc=np.max)
无法使用ValueError运行:传递的项目数错误3,放置意味着1
如何使其工作?与聚合函数一起使用np.max
:
df = (df.pivot_table(index='key', columns='col', aggfunc=np.max)
.swaplevel(0,1,axis=1)
.sort_index(axis=1))
备选方案通过以下方式汇总:
使用
melt
、set_index
和unstack
,这仅在您期望每个单元格的值时有效,否则您可以使用第二个选项来聚合值:
df.melt(['key','col'])\
.set_index(['key','col','variable'])['value']\
.unstack([1,2])\
.sort_index(axis=1)
输出:
col A B C
variable val1 val2 val3 val1 val2 val3 val1 val2 val3
key
four -1.964246 0.958854 -0.605128 0.055120 -1.144306 -0.800712 -0.917324 -0.581882 -0.152399
one 0.513347 -1.689448 -2.434481 0.990924 -1.014848 0.713703 1.344299 0.052877 1.174183
three -0.156336 -0.156157 -2.253689 0.877726 -0.686758 -0.407892 0.816636 1.008870 -0.390872
two 1.942495 1.811712 -0.762283 -2.169613 -1.073372 0.201996 -1.073370 -0.902032 -0.168796
col A B C
variable val1 val2 val3 val1 val2 val3 val1 val2 val3
key
four -1.964246 0.958854 -0.605128 0.055120 -1.144306 -0.800712 -0.917324 -0.581882 -0.152399
one 0.513347 -1.689448 -2.434481 0.990924 -1.014848 0.713703 1.344299 0.052877 1.174183
three -0.156336 -0.156157 -2.253689 0.877726 -0.686758 -0.407892 0.816636 1.008870 -0.390872
two 1.942495 1.811712 -0.762283 -2.169613 -1.073372 0.201996 -1.073370 -0.902032 -0.168796
使用melt
和pd.crosstab
的另一个选项:
df1 = df.melt(['key','col'])
pd.crosstab(df1.key, [df1.col, df1.variable], df1.value, aggfunc=np.max)
输出:
col A B C
variable val1 val2 val3 val1 val2 val3 val1 val2 val3
key
four -1.964246 0.958854 -0.605128 0.055120 -1.144306 -0.800712 -0.917324 -0.581882 -0.152399
one 0.513347 -1.689448 -2.434481 0.990924 -1.014848 0.713703 1.344299 0.052877 1.174183
three -0.156336 -0.156157 -2.253689 0.877726 -0.686758 -0.407892 0.816636 1.008870 -0.390872
two 1.942495 1.811712 -0.762283 -2.169613 -1.073372 0.201996 -1.073370 -0.902032 -0.168796
col A B C
variable val1 val2 val3 val1 val2 val3 val1 val2 val3
key
four -1.964246 0.958854 -0.605128 0.055120 -1.144306 -0.800712 -0.917324 -0.581882 -0.152399
one 0.513347 -1.689448 -2.434481 0.990924 -1.014848 0.713703 1.344299 0.052877 1.174183
three -0.156336 -0.156157 -2.253689 0.877726 -0.686758 -0.407892 0.816636 1.008870 -0.390872
two 1.942495 1.811712 -0.762283 -2.169613 -1.073372 0.201996 -1.073370 -0.902032 -0.168796
我想你可能会觉得有用。看起来您想要使用
多索引方法。我喜欢透视表的简单性