Python 如何在数据透视表中聚合数据？_Python_Pandas_Pivot Table

Python 如何在数据透视表中聚合数据？

python pandas

Python 如何在数据透视表中聚合数据？,python,pandas,pivot-table,Python,Pandas,Pivot Table,我正在执行一项空间对齐任务，探索不同分数/重分数功能对对齐质量的影响（通过RMSD测量）。我有长表格数据，其中我为不同系统运行了所有评分/重新评分组合，并重复了3次以下是一些样本测试数据： identifier score rescore rmsd repeat 0 1abc plp asp 1.2 1 1 1abc plp asp 1.3 2 2 1abc plp asp

我正在执行一项空间对齐任务，探索不同分数/重分数功能对对齐质量的影响（通过RMSD测量）。我有长表格数据，其中我为不同系统运行了所有评分/重新评分组合，并重复了3次

以下是一些样本测试数据：

   identifier score rescore  rmsd  repeat
0        1abc   plp     asp   1.2       1
1        1abc   plp     asp   1.3       2
2        1abc   plp     asp   1.5       3
3        1abc   plp     plp   3.2       1
4        1abc   plp     plp   3.3       2
5        1abc   plp     plp   3.5       3
6        1abc   asp     asp   5.2       1
7        1abc   asp     asp   5.3       2
8        1abc   asp     asp   5.5       3
9        1abc   asp     plp   1.2       1
10       1abc   asp     plp   1.3       2
11       1abc   asp     plp   1.5       3
12       2def   plp     asp   1.0       1
13       2def   plp     asp   1.1       2
14       2def   plp     asp   1.2       3
15       2def   plp     plp   3.0       1
16       2def   plp     plp   3.1       2
17       2def   plp     plp   3.2       3
18       2def   asp     asp   5.0       1
19       2def   asp     asp   5.1       2
20       2def   asp     asp   5.2       3
21       2def   asp     plp   1.0       1
22       2def   asp     plp   1.3       2
23       2def   asp     plp   1.7       3

对于这个特定的任务，RMSD您可以

.melt（）

透视表并再次透视它

systems = len(set(df.identifier))

pd.pivot_table(df, 
               index='score', 
               columns= ['rescore', 'repeat'], 
               values='rmsd', 
               aggfunc=lambda x:((x <= 1.5).sum()/systems)*100
).melt(ignore_index=False)\
    .reset_index()\
    .pivot_table(index='score',
                 columns='rescore', 
                 values='value', 
                 aggfunc=['mean', 'std'])

或者，您可以将

repeat

参数移动到

pd.pivot_table（）

函数中的

index

参数，并使用

.groupby（）

方法

systems = len(set(df.identifier))

pd.pivot_table(df, 
               index=['score', 'repeat'], 
               columns= 'rescore', 
               values='rmsd', 
               aggfunc=lambda x:((x <= 1.5).sum()/systems)*100
).reset_index()\
    .groupby('score')[df['rescore'].unique()].agg(['mean', 'std'])\
    .swaplevel(0,1,1)\
    .sort_index(axis=1)

systems=len（设置（测向标识符））
pd.透视表（df，
索引=['score'，'repeat']，
列='rescore'，
值='rmsd'，
aggfunc=lambda x:（（x您可以.melt（）
透视表并再次透视它
systems = len(set(df.identifier))

pd.pivot_table(df, 
               index='score', 
               columns= ['rescore', 'repeat'], 
               values='rmsd', 
               aggfunc=lambda x:((x <= 1.5).sum()/systems)*100
).melt(ignore_index=False)\
    .reset_index()\
    .pivot_table(index='score',
                 columns='rescore', 
                 values='value', 
                 aggfunc=['mean', 'std'])

或者，您可以将repeat
参数移动到pd.pivot_table（）
函数中的index
参数，并使用.groupby（）
方法
systems = len(set(df.identifier))

pd.pivot_table(df, 
               index=['score', 'repeat'], 
               columns= 'rescore', 
               values='rmsd', 
               aggfunc=lambda x:((x <= 1.5).sum()/systems)*100
).reset_index()\
    .groupby('score')[df['rescore'].unique()].agg(['mean', 'std'])\
    .swaplevel(0,1,1)\
    .sort_index(axis=1)

systems=len（设置（测向标识符））
pd.透视表（df，
索引=['score'，'repeat']，
列='rescore'，
值='rmsd'，
aggfunc=lambda x:（（x
systems = len(set(df.identifier))

pd.pivot_table(df, 
               index=['score', 'repeat'], 
               columns= 'rescore', 
               values='rmsd', 
               aggfunc=lambda x:((x <= 1.5).sum()/systems)*100
).reset_index()\
    .groupby('score')[df['rescore'].unique()].agg(['mean', 'std'])\
    .swaplevel(0,1,1)\
    .sort_index(axis=1)