Python 2.7 （Python2）多层列的数据帧_Python 2.7_Pandas_Dataframe

Python 2.7 （Python2）多层列的数据帧

python-2.7 pandas dataframe

Python 2.7 （Python2）多层列的数据帧,python-2.7,pandas,dataframe,Python 2.7,Pandas,Dataframe,我想添加格式相同的dataframe的值。为埃克斯梅普 >>> my_dataframe1 class1 score subject 1 2 3 student 0 1 2 5 1 2 3 9 2 8 7 2 3 3 4 7 4 6 7 7 >>> my_dataframe2

我想添加格式相同的dataframe的值。为埃克斯梅普

>>> my_dataframe1

         class1 score
subject  1    2    3
student
0        1    2    5
1        2    3    9
2        8    7    2
3        3    4    7
4        6    7    7

>>> my_dataframe2

         class2 score
subject  1    2    3
student
0        4    2    2
1        4    4    14
2        8    7    7
3        1    2    NaN
4        NaN  2    3

如您所见，这两个数据帧具有多层列，其中主列为“class score”，子列为“subject”。我想做的是得到汇总的数据帧，可以这样显示

            score
subject  1    2    3
student
0        5    4    7
1        2    1    5
2        16   14   9
3        4    6    7
4        6    9    10

事实上，我可以通过

for i in my_dataframe1['class1 score'].index:
    my_dataframe1['class1 score'].loc[i,:] = my_dataframe1['class1 score'].loc[i,:].add(my_dataframe2['class2 score'].loc[i,:], fill_value = 0)

但是，当维度增加时，获取结果数据帧需要花费大量时间，我确实认为这不是解决问题的好方法。

IIUC:

df_out = df['class1 score'].add(df2['class2 score'],fill_value=0).add_prefix('scores_')

df_out.columns = df_out.columns.str.split('_',expand=True)

df_out

输出：

        scores          
             1   2     3
student                 
0          5.0   4   7.0
1          6.0   7  23.0
2         16.0  14   9.0
3          4.0   6   7.0
4          6.0   9  10.0

IIUC：

输出：

        scores          
             1   2     3
student                 
0          5.0   4   7.0
1          6.0   7  23.0
2         16.0  14   9.0
3          4.0   6   7.0
4          6.0   9  10.0

如果从第二个数据帧添加

值

，它将忽略索引

# you don't need `astype(int)`.
my_dataframe1.add(my_dataframe2.values, fill_value=0).astype(int)

        class1 score        
subject            1   2   3
student                     
0                  5   4   7
1                  6   7  23
2                 16  14   9
3                  4   6   7
4                  6   9  10

设置

my_dataframe1 = pd.DataFrame([
    [1, 2, 5],
    [2, 3, 9],
    [8, 7, 2],
    [3, 4, 7],
    [6, 7, 7]
], pd.RangeIndex(5, name='student'), pd.MultiIndex.from_product([['class1 score'], [1, 2, 3]], names=[None, 'subject']))

my_dataframe2 = pd.DataFrame([
    [4, 2, 2],
    [4, 4, 14],
    [8, 7, 7],
    [1, 2, np.nan],
    [np.nan, 2, 3]
], pd.RangeIndex(5, name='student'), pd.MultiIndex.from_product([['class2 score'], [1, 2, 3]], names=[None, 'subject']))

如果从第二个数据帧添加

值

，它将忽略索引

# you don't need `astype(int)`.
my_dataframe1.add(my_dataframe2.values, fill_value=0).astype(int)

        class1 score        
subject            1   2   3
student                     
0                  5   4   7
1                  6   7  23
2                 16  14   9
3                  4   6   7
4                  6   9  10

设置

my_dataframe1 = pd.DataFrame([
    [1, 2, 5],
    [2, 3, 9],
    [8, 7, 2],
    [3, 4, 7],
    [6, 7, 7]
], pd.RangeIndex(5, name='student'), pd.MultiIndex.from_product([['class1 score'], [1, 2, 3]], names=[None, 'subject']))

my_dataframe2 = pd.DataFrame([
    [4, 2, 2],
    [4, 4, 14],
    [8, 7, 7],
    [1, 2, np.nan],
    [np.nan, 2, 3]
], pd.RangeIndex(5, name='student'), pd.MultiIndex.from_product([['class2 score'], [1, 2, 3]], names=[None, 'subject']))

我的方法是将数据保持在相同的数据帧中。您可以将已有的两个连接起来：

big_df = pd.concat([my_dataframe1, my_dataframe2], axis=1)

然后在较大的数据帧上求和，指定

级别

：

big_df.sum(axis=1, level='subject')

我的方法是将数据保持在相同的数据帧中。您可以将已有的两个连接起来：

big_df = pd.concat([my_dataframe1, my_dataframe2], axis=1)

然后在较大的数据帧上求和，指定

级别

：

big_df.sum(axis=1, level='subject')