Python 熊猫-连接两个多索引数据帧_Python_Pandas

Python 熊猫-连接两个多索引数据帧

python pandas

Python 熊猫-连接两个多索引数据帧,python,pandas,Python,Pandas,我有一个数据帧，如下所示： df.head() Student Name Q1 Q2 Q3 Month Roll No 2016-08-01 0 Save Mithil Vinay 0.0 0.0 0.0 1 Abraham Ancy Chandy 6.0 5.0 5.0 2 Barabde Pranjal Sanjiv 7.

我有一个数据帧，如下所示：

df.head()
                Student Name            Q1  Q2  Q3
Month   Roll No             
2016-08-01  0   Save Mithil Vinay       0.0 0.0 0.0
            1   Abraham Ancy Chandy     6.0 5.0 5.0
            2   Barabde Pranjal Sanjiv  7.0 5.0 5.0
            3   Bari Siddhesh Kishor    8.0 5.0 3.0
            4   Barretto Cleon Domnic   1.0 5.0 4.0

                Student Name            Q1  Q2  Q3
Month   Roll No             
2016-08-01  0   Save Mithil Vinay       0.0 0.0 0.0
            1   Abraham Ancy Chandy     8.0 5.0 5.0
            2   Barabde Pranjal Sanjiv  7.0 5.0 4.0
            3   Bari Siddhesh Kishor    8.0 4.0 3.0
            4   Barretto Cleon Domnic   2.0 3.0 4.0

现在我想建立一个层次化的列索引，所以我采用以下方法：

big_df = pd.concat([df['Student Name'], df[['Q1', 'Q2', 'Q3']]], axis=1, keys=['Name', 'IS'])

并且能够获得以下信息：

>>> big_df
                Name                    IS
                Student Name            Q1  Q2  Q3
Month   Roll No             
2016-08-01  0   Save Mithil Vinay       0.0 0.0 0.0
            1   Abraham Ancy Chandy     6.0 5.0 5.0
            2   Barabde Pranjal Sanjiv  7.0 5.0 5.0
            3   Bari Siddhesh Kishor    8.0 5.0 3.0
            4   Barretto Cleon Domnic   1.0 5.0 4.0

                Name                    IS          CC
                Student Name            Q1  Q2  Q3  Q1  Q2  Q3
Month   Roll No                             
2016-08-01  0   Save Mithil Vinay       0.0 0.0 0.0 0.0 0.0 0.0
            1   Abraham Ancy Chandy     6.0 5.0 5.0 8.0 5.0 5.0
            2   Barabde Pranjal Sanjiv  7.0 5.0 5.0 7.0 5.0 4.0
            3   Bari Siddhesh Kishor    8.0 5.0 3.0 8.0 4.0 3.0
            4   Barretto Cleon Domnic   1.0 5.0 4.0 2.0 3.0 4.0

现在对于第二次迭代，我只想将新数据帧中的

Q1、Q2、Q3

值连接到

big_df

数据帧（先前连接的数据帧）。现在，第二次迭代的数据帧如下所示：

df.head()
                Student Name            Q1  Q2  Q3
Month   Roll No             
2016-08-01  0   Save Mithil Vinay       0.0 0.0 0.0
            1   Abraham Ancy Chandy     6.0 5.0 5.0
            2   Barabde Pranjal Sanjiv  7.0 5.0 5.0
            3   Bari Siddhesh Kishor    8.0 5.0 3.0
            4   Barretto Cleon Domnic   1.0 5.0 4.0

                Student Name            Q1  Q2  Q3
Month   Roll No             
2016-08-01  0   Save Mithil Vinay       0.0 0.0 0.0
            1   Abraham Ancy Chandy     8.0 5.0 5.0
            2   Barabde Pranjal Sanjiv  7.0 5.0 4.0
            3   Bari Siddhesh Kishor    8.0 4.0 3.0
            4   Barretto Cleon Domnic   2.0 3.0 4.0

我想要的

big_df

如下所示：

>>> big_df
                Name                    IS
                Student Name            Q1  Q2  Q3
Month   Roll No             
2016-08-01  0   Save Mithil Vinay       0.0 0.0 0.0
            1   Abraham Ancy Chandy     6.0 5.0 5.0
            2   Barabde Pranjal Sanjiv  7.0 5.0 5.0
            3   Bari Siddhesh Kishor    8.0 5.0 3.0
            4   Barretto Cleon Domnic   1.0 5.0 4.0

                Name                    IS          CC
                Student Name            Q1  Q2  Q3  Q1  Q2  Q3
Month   Roll No                             
2016-08-01  0   Save Mithil Vinay       0.0 0.0 0.0 0.0 0.0 0.0
            1   Abraham Ancy Chandy     6.0 5.0 5.0 8.0 5.0 5.0
            2   Barabde Pranjal Sanjiv  7.0 5.0 5.0 7.0 5.0 4.0
            3   Bari Siddhesh Kishor    8.0 5.0 3.0 8.0 4.0 3.0
            4   Barretto Cleon Domnic   1.0 5.0 4.0 2.0 3.0 4.0

我尝试了以下代码，但都出现了错误：

big_df.concat([df[['Q1', 'Q2', 'Q3']]], axis=1, keys=['CC'])

pd.concat([big_df, df[['Q1', 'Q2', 'Q3']]], axis=1, keys=['Name', 'CC'])

我在哪里出错？请帮忙。我是新来熊猫的

放下最顶层的

big_df

：

big_df.columns = big_df.columns.droplevel(level=0)

连接它们，提供三个不同的帧作为输入，与要使用的关键帧数匹配：

Q_cols = ['Q1', 'Q2', 'Q3']
key_names = ['Name', 'IS', 'CC']
pd.concat([big_df[['Student Name']], big_df[Q_cols], df[Q_cols]], axis=1, keys=key_names)

首先，您最好将索引设置为

[“月份”、“卷号”、“学生姓名”]

。这将大大简化您的concat语法，并确保您也能匹配学生的姓名

df.set_index('Student Name', append=True, inplace=True)

其次，我建议您采用不同的方式，在迭代过程中存储

df

数据帧（使用Q1/Q2/Q3值），并引用最高列级别的名称（例如：“IS”、“CC”）。dict非常适合这样做，pandas确实接受dict作为

pd.concat

# Creating a dictionnary with the first df from your question
df_dict = {'IS': df}

# Iterate....
   # Append the new df to the df_dict
   df_dict['CC'] = df

现在，在循环之后，这是你的口述：

df_dict

In [10]: df_dict

Out[10]:
{'CC':                                             Q1   Q2   Q3
 Month      Roll No Student Name                         
 2016-08-01 0       Save Mithil Vinay       0.0  0.0  0.0
            1       Abraham Ancy Chandy     6.0  5.0  5.0
            2       Barabde Pranjal Sanjiv  7.0  5.0  5.0
            3       Bari Siddhesh Kisho     8.0  5.0  3.0
            4       Barretto Cleon Domnic   1.0  5.0  4.0,
 'IS':                                             Q1   Q2   Q3
 Month      Roll No Student Name                         
 2016-08-01 0       Save Mithil Vinay       0.0  0.0  0.0
            1       Abraham Ancy Chandy     8.0  5.0  5.0
            2       Barabde Pranjal Sanjiv  7.0  5.0  4.0
            3       Bari Siddhesh Kisho     8.0  4.0  3.0
            4       Barretto Cleon Domnic   2.0  3.0  4.0}

因此，如果您现在进行concat，pandas会为您自动完成这项工作：

In [11]: big_df = pd.concat(df_dict, axis=1)
         big_df

Out[11]:

如果您真的想以迭代的方式进行，那么应该在使用big_df进行concat之前预先准备好新的多级（'CC'）

df.columns = pd.MultiIndex.from_tuples([('IS', x) for x in df.columns])

# Then you can concat, give the same result as the picture above.
pd.concat([big_df, df], axis=1)

如果你在发布问题时，用一些可以简单复制并使用pd.read_clipboard（）获取初始数据的东西格式化你的问题，那就太棒了。您应该测试它是否有效，这也将突出显示在read_clipboard（）或几个后处理行中需要哪些参数才能准确获取数据帧。这将使任何人都更容易提供帮助。@JulienMarrec对此表示抱歉。。。下次会改进的。谢谢你的支持，非常感谢。我需要降低

级别0

。这就产生了问题。谢谢你的帮助，但是

学生名

正在连接。和。如何删除

学生名

？我想你错过了我说过你应该将索引设置为['Month'、'Roll No'、'Student Name']的那部分。对于您：，您需要执行

df.set_index（'Student Name'，append=True，inplace=True）

。。谢谢。。。很抱歉没有正确地发布问题。