Pandas/Python如何在保留df结构的同时切换数据帧中的索引/列?
我有一个熊猫数据框,看起来像这样:Pandas/Python如何在保留df结构的同时切换数据帧中的索引/列?,python,python-3.x,pandas,dataframe,Python,Python 3.x,Pandas,Dataframe,我有一个熊猫数据框,看起来像这样: X1 X1 X1 X2 X2 X2 ABC 12.4 34.3 25.4 29.3 53.2 38.9 DEF 22.3 28.6 32.8 24.6 29.4 25.3 ABC ABC ABC DEF DEF DEF X1 12.4 34.3 25.4 22.3 28.6 32.8 X2 29.3
X1 X1 X1 X2 X2 X2
ABC 12.4 34.3 25.4 29.3 53.2 38.9
DEF 22.3 28.6 32.8 24.6 29.4 25.3
ABC ABC ABC DEF DEF DEF
X1 12.4 34.3 25.4 22.3 28.6 32.8
X2 29.3 53.2 38.9 24.6 29.4 25.3
左边的列是索引,顶部的值是列标签。我正在尝试交换列名和索引,以便如下所示:
X1 X1 X1 X2 X2 X2
ABC 12.4 34.3 25.4 29.3 53.2 38.9
DEF 22.3 28.6 32.8 24.6 29.4 25.3
ABC ABC ABC DEF DEF DEF
X1 12.4 34.3 25.4 22.3 28.6 32.8
X2 29.3 53.2 38.9 24.6 29.4 25.3
如果添加编号索引,我可以使用堆栈和取消堆栈切换轴,但是复制是垂直列出的,而不是水平列出的。我不知道如何才能使单个复制保持并排,这对于我尝试使用表所做的工作是必要的。复制需要分开,我不想要平均值/总和/等等
如有任何帮助/建议,将不胜感激
谢谢
编辑:
此代码提供的数据帧在结构上与我的实际数据相似,但列数较少:
names = ["G1","G2","G3","G4", "G5", "G6", "G7", "G8"]
df = pd.DataFrame([(7.345,"NaN","NaN",239.947,295.893,349.834),(13.872,"NaN","NaN",20.485,14.852,29.598),(764.298,"NaN","NaN",492.854,432.943,539.950),(0.00385,"NaN","NaN",0.184,0.384,0.285),(285.836,"NaN","NaN",495.284,395.486,368.952),(7.385,"NaN","NaN",5.293,4.295,4.692),(21.693,"NaN","NaN",25.843,15.843,15.386),(8.583,"NaN","NaN",4.397,6.295,6.39)], names, ["S1", "S1", "S1", "482.1", "482.1", "482.1"])
给定此数据帧:
S1 S1 S1 482.1 482.1 482.1
G1 7.34500 NaN NaN 239.947 295.893 349.834
G2 13.87200 NaN NaN 20.485 14.852 29.598
G3 764.29800 NaN NaN 492.854 432.943 539.950
G4 0.00385 NaN NaN 0.184 0.384 0.285
G5 285.83600 NaN NaN 495.284 395.486 368.952
G6 7.38500 NaN NaN 5.293 4.295 4.692
G7 21.69300 NaN NaN 25.843 15.843 15.386
G8 8.58300 NaN NaN 4.397 6.295 6.390
运行:
df2 = df.copy()
m = dict(zip(df2.index.unique(), df2.columns.unique()))
df2.index = df2.index.map(m.get)
df2.columns = df2.columns.map({v : k for k, v in m.items()}.get)
df2 = df.copy()
m = dict(zip(df2.index.unique(), df2.columns.unique()))
df2 = df2.rename(index=m, columns={v : k for k, v in m.items()})
给出:
G1 G1 G1 G2 G2 G2
S1 7.34500 NaN NaN 239.947 295.893 349.834
482.1 13.87200 NaN NaN 20.485 14.852 29.598
NaN 764.29800 NaN NaN 492.854 432.943 539.950
NaN 0.00385 NaN NaN 0.184 0.384 0.285
NaN 285.83600 NaN NaN 495.284 395.486 368.952
NaN 7.38500 NaN NaN 5.293 4.295 4.692
NaN 21.69300 NaN NaN 25.843 15.843 15.386
NaN 8.58300 NaN NaN 4.397 6.295 6.390
G1 G1 G1 G2 G2 G2
S1 7.34500 NaN NaN 239.947 295.893 349.834
482.1 13.87200 NaN NaN 20.485 14.852 29.598
G3 764.29800 NaN NaN 492.854 432.943 539.950
G4 0.00385 NaN NaN 0.184 0.384 0.285
G5 285.83600 NaN NaN 495.284 395.486 368.952
G6 7.38500 NaN NaN 5.293 4.295 4.692
G7 21.69300 NaN NaN 25.843 15.843 15.386
G8 8.58300 NaN NaN 4.397 6.295 6.390
列和索引标签已移动,但与它们关联的数据尚未移动,并且缺少多个列。运行:
df2 = df.copy()
m = dict(zip(df2.index.unique(), df2.columns.unique()))
df2 = df2.rename(index=m, columns={v : k for k, v in m.items()})
给出:
G1 G1 G1 G2 G2 G2
S1 7.34500 NaN NaN 239.947 295.893 349.834
482.1 13.87200 NaN NaN 20.485 14.852 29.598
NaN 764.29800 NaN NaN 492.854 432.943 539.950
NaN 0.00385 NaN NaN 0.184 0.384 0.285
NaN 285.83600 NaN NaN 495.284 395.486 368.952
NaN 7.38500 NaN NaN 5.293 4.295 4.692
NaN 21.69300 NaN NaN 25.843 15.843 15.386
NaN 8.58300 NaN NaN 4.397 6.295 6.390
G1 G1 G1 G2 G2 G2
S1 7.34500 NaN NaN 239.947 295.893 349.834
482.1 13.87200 NaN NaN 20.485 14.852 29.598
G3 764.29800 NaN NaN 492.854 432.943 539.950
G4 0.00385 NaN NaN 0.184 0.384 0.285
G5 285.83600 NaN NaN 495.284 395.486 368.952
G6 7.38500 NaN NaN 5.293 4.295 4.692
G7 21.69300 NaN NaN 25.843 15.843 15.386
G8 8.58300 NaN NaN 4.397 6.295 6.390
出于类似的原因,这也是错误的。如果只有两行,但列是
x1x1x1x2x2x3
,我个人认为这两种表示都会导致很多问题。通常情况下,重复列/索引不是一个好主意。我意识到,如果我要在python中更多地操作数据,我需要一个不同的解决方案。但是,输出将用于需要此格式的GUI程序中。您可以添加此数据的预期输出吗?看来我严重误解了你问题的意图(我的道歉)。很好,你明白了@Cᴏʟᴅsᴘᴇᴇᴅ 这不是一个常见的问题…因为重复列的名称会产生很多问题
New_df=df.T.groupby(level=0).agg(lambda x : x.values.tolist()).stack().apply(pd.Series).unstack().sort_index(level=1,axis=1)
New_df.columns=New_df.columns.droplevel(level=0)
New_df
Out[229]:
ABC ABC ABC DEF DEF DEF
X1 12.4 34.3 25.4 22.3 28.6 32.8
X2 29.3 53.2 38.9 24.6 29.4 25.3