Python 保持数据帧列及其在透视表中的顺序
我有一个数据帧:Python 保持数据帧列及其在透视表中的顺序,python,pandas,dataframe,pivot-table,Python,Pandas,Dataframe,Pivot Table,我有一个数据帧: df = pd.DataFrame({'No': [123,123,123,523,523,523,765], 'Type': ['A','B','C','A','C','D','A'], 'Task': ['First','Second','First','Second','Third','First','Fifth'], 'Color': ['blue','red'
df = pd.DataFrame({'No': [123,123,123,523,523,523,765],
'Type': ['A','B','C','A','C','D','A'],
'Task': ['First','Second','First','Second','Third','First','Fifth'],
'Color': ['blue','red','blue','black','red','red','red'],
'Price': [10,5,1,12,12,12,18],
'Unit': ['E','E','E','E','E','E','E'],
'Pers.ID': [45,6,6,43,1,9,2]
})
看起来是这样的:
df
+-----+------+--------+-------+-------+------+---------+
| No | Type | Task | Color | Price | Unit | Pers.ID |
+-----+------+--------+-------+-------+------+---------+
| 123 | A | First | blue | 10 | E | 45 |
| 123 | B | Second | red | 5 | E | 6 |
| 123 | C | First | blue | 1 | E | 6 |
| 523 | A | Second | black | 12 | E | 43 |
| 523 | C | Third | red | 12 | E | 1 |
| 523 | D | First | red | 12 | E | 9 |
| 765 | A | First | red | 18 | E | 2 |
+-----+------+--------+-------+-------+------+---------+
然后我创建了一个透视表:
piv = pd.pivot_table(df, index=['No','Type','Task'])
结果:
Pers.ID Price
No Type Task
123 A First 45 10
B Second 6 5
C First 6 1
523 A Second 43 12
C Third 1 12
D First 9 12
765 A Fifth 2 18
如您所见,问题是:
- 多个列消失(颜色和单位)
- Price和Pers.ID列的顺序与原始数据帧中的顺序不同
cols = list(df.columns)
piv = pd.pivot_table(df, index=['No','Type','Task'], values = cols)
但结果是一样的
我读过其他的帖子,但没有一篇能在某种程度上与我的问题相匹配
谢谢大家!
编辑:所需输出
Color Price Unit Pers.ID
No Type Task
123 A First blue 10 E 45
B Second red 5 E 6
C First blue 1 E 6
523 A Second black 12 E 43
C Third red 12 E 1
D First red 12 E 9
765 A Fifth red 18 E 2
我认为问题出在
pivot\u表中
默认聚合函数是mean
,所以。因此需要自定义功能,订单也会更改,因此需要:
f = lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ', '.join(x)
cols = df.columns[~df.columns.isin(['No','Type','Task'])].tolist()
piv = (pd.pivot_table(df,
index=['No','Type','Task'],
values = cols,
aggfunc=f).reindex(columns=cols))
print (piv)
Color Price Unit Pers.ID
No Type Task
123 A First blue 10 E 45
B Second red 5 E 6
C First blue 1 E 6
523 A Second black 12 E 43
C Third red 12 E 1
D First red 12 E 9
765 A Fifth red 18 E 2
另一种解决方案具有groupby
和相同的聚合功能,排序没有问题:
df = (df.groupby(['No','Type','Task'])
.agg(lambda x: x.sum() if np.issubdtype(x.dtype, np.number) else ', '.join(x)))
print (df)
Color Price Unit Pers.ID
No Type Task
123 A First blue 10 E 45
B Second red 5 E 6
C First blue 1 E 6
523 A Second black 12 E 43
C Third red 12 E 1
D First red 12 E 9
765 A Fifth red 18 E 2
但如果需要将前3列仅设置为多索引
:
df = df.set_index(['No','Type','Task'])
print (df)
Color Price Unit Pers.ID
No Type Task
123 A First blue 10 E 45
B Second red 5 E 6
C First blue 1 E 6
523 A Second black 12 E 43
C Third red 12 E 1
D First red 12 E 9
765 A Fifth red 18 E 2
对不起,这有误导性。除了用作索引的列(否、类型、任务),我希望保持列的顺序。我会把想要的结果添加到我的问题中。@MaMo-编辑过的答案,但如果不需要聚合值,第三种解决方案是最好的。似乎效果很好。我有点困惑,因为我想我需要一个数据透视表来将结果保存在excel中。以前尝试过groupby,但无法导出为excel,然后我阅读了有关pivot解决方案的信息。不知道有这么简单的答案!非常感谢:)@MaMo-这取决于需要什么,
pivot\u table
这里不容易;)