Python 旋转多索引数据_Python_Pandas_Reshape

Python 旋转多索引数据

python pandas

Python 旋转多索引数据,python,pandas,reshape,Python,Pandas,Reshape,我有一个多索引数据框，看起来像这样： # Flatten the MultiIndex columns df.columns = [' '.join(col).strip() for col in df.columns.values] # Save some typing idx = ['Center_Details State', 'Center_Details District', 'Center_Details Center'] # Create a long dataframe lo

我有一个多索引数据框，看起来像这样：

# Flatten the MultiIndex columns
df.columns = [' '.join(col).strip() for col in df.columns.values]

# Save some typing
idx = ['Center_Details State', 'Center_Details District', 'Center_Details Center']

# Create a long dataframe
long = pd.melt(df, id_vars = idx)

# Split the "variable" column at the space created when flattening the MultiIndex
long['QTR'], long['item'] = zip(*long['variable'].map(lambda x: x.split(' ')))

# Reshape to wide format, keeping "QTR" as a column
out = pd.pivot_table(long, index = idx + ["QTR"], columns = 'item', 
                     values = 'value', aggfunc = 'first').reset_index()
print(out)
item Center_Details State Center_Details District Center_Details Center  \
0         JAMMU & KASHMIR                 KUPWARA       Drug Mulla (CT)   
1         JAMMU & KASHMIR                 KUPWARA       Drug Mulla (CT)   
2         JAMMU & KASHMIR              LEH LADAKH       Chuglamsar (CT)   
3         JAMMU & KASHMIR              LEH LADAKH       Chuglamsar (CT)   
4                  PUNJAB               GURDASPUR                 TIBRI   
5                  PUNJAB               GURDASPUR                 TIBRI   
6                  PUNJAB               PATHANKOT            Mamun (CT)   
7                  PUNJAB               PATHANKOT            Mamun (CT)   

item         QTR Credit Deposit Offices  
0     2017-18:Q1    600     500       4  
1     2017-18:Q2    600     500       3  
2     2017-18:Q1    600     500       4  
3     2017-18:Q2    600     500       3  
4     2017-18:Q1    600     500       4  
5     2017-18:Q2    600     500       3  
6     2017-18:Q1    600     500       4  
7     2017-18:Q2    600     500       3

我希望不同的四分之一作为行而不是层次列，即长格式而不是宽格式。类似这样的内容（输出不必是多索引）：

我怎样才能在熊猫身上做到这一点

编辑：

请求的示例输入文件：

样本数据（熊猫）：

一种方法可以是只展平

多索引

，然后使用

melt

和

pivot\u表

，如下所示：

# Flatten the MultiIndex columns
df.columns = [' '.join(col).strip() for col in df.columns.values]

# Save some typing
idx = ['Center_Details State', 'Center_Details District', 'Center_Details Center']

# Create a long dataframe
long = pd.melt(df, id_vars = idx)

# Split the "variable" column at the space created when flattening the MultiIndex
long['QTR'], long['item'] = zip(*long['variable'].map(lambda x: x.split(' ')))

# Reshape to wide format, keeping "QTR" as a column
out = pd.pivot_table(long, index = idx + ["QTR"], columns = 'item', 
                     values = 'value', aggfunc = 'first').reset_index()
print(out)
item Center_Details State Center_Details District Center_Details Center  \
0         JAMMU & KASHMIR                 KUPWARA       Drug Mulla (CT)   
1         JAMMU & KASHMIR                 KUPWARA       Drug Mulla (CT)   
2         JAMMU & KASHMIR              LEH LADAKH       Chuglamsar (CT)   
3         JAMMU & KASHMIR              LEH LADAKH       Chuglamsar (CT)   
4                  PUNJAB               GURDASPUR                 TIBRI   
5                  PUNJAB               GURDASPUR                 TIBRI   
6                  PUNJAB               PATHANKOT            Mamun (CT)   
7                  PUNJAB               PATHANKOT            Mamun (CT)   

item         QTR Credit Deposit Offices  
0     2017-18:Q1    600     500       4  
1     2017-18:Q2    600     500       3  
2     2017-18:Q1    600     500       4  
3     2017-18:Q2    600     500       3  
4     2017-18:Q1    600     500       4  
5     2017-18:Q2    600     500       3  
6     2017-18:Q1    600     500       4  
7     2017-18:Q2    600     500       3

另一种选择可能类似于：

long = df.set_index(['Center_Details']).stack().T.unstack()
long = pd.concat([pd.DataFrame(long.reset_index()['Center_Details'].tolist()), 
                  long.reset_index()], axis=1)
long.columns = ['State', 'District', 'Center', 'Center_Details', 
                'Items', 'QTR', 'Value']
out = pd.pivot_table(long, index=['State', 'District', 'Center', 'QTR'], 
                     columns='Items', values='Value', 
                     aggfunc='first').reset_index()
print(out)
Items            State    District           Center         QTR Credit  \
0      JAMMU & KASHMIR     KUPWARA  Drug Mulla (CT)  2017-18:Q1    600   
1      JAMMU & KASHMIR     KUPWARA  Drug Mulla (CT)  2017-18:Q2    600   
2      JAMMU & KASHMIR  LEH LADAKH  Chuglamsar (CT)  2017-18:Q1    600   
3      JAMMU & KASHMIR  LEH LADAKH  Chuglamsar (CT)  2017-18:Q2    600   
4               PUNJAB   GURDASPUR            TIBRI  2017-18:Q1    600   
5               PUNJAB   GURDASPUR            TIBRI  2017-18:Q2    600   
6               PUNJAB   PATHANKOT       Mamun (CT)  2017-18:Q1    600   
7               PUNJAB   PATHANKOT       Mamun (CT)  2017-18:Q2    600   

Items Deposit Offices  
0         500       4  
1         500       3  
2         500       4  
3         500       3  
4         500       4  
5         500       3  
6         500       4  
7         500       3

# Flatten the column names, but reverse the order of the tuples
#   before flattening, and add a character to split on
df.columns = ['~'.join(col[::-1]).strip() for col in df.columns.values]

# Reshape the data, Stata-style
pd.wide_to_long(df, ['Offices', 'Deposit', 'Credit'], 
   i=['State~Center_Details', 'District~Center_Details', 'Center~Center_Details'],
   j='Quarter', sep='~').reset_index()

第三个选项是使用

wide\u to\u long

，但是

wide\u to\u long

要求宽格式的列在开头有存根。该方法与第一种方法类似，但涉及的步骤较少

它看起来像：

long = df.set_index(['Center_Details']).stack().T.unstack()
long = pd.concat([pd.DataFrame(long.reset_index()['Center_Details'].tolist()), 
                  long.reset_index()], axis=1)
long.columns = ['State', 'District', 'Center', 'Center_Details', 
                'Items', 'QTR', 'Value']
out = pd.pivot_table(long, index=['State', 'District', 'Center', 'QTR'], 
                     columns='Items', values='Value', 
                     aggfunc='first').reset_index()
print(out)
Items            State    District           Center         QTR Credit  \
0      JAMMU & KASHMIR     KUPWARA  Drug Mulla (CT)  2017-18:Q1    600   
1      JAMMU & KASHMIR     KUPWARA  Drug Mulla (CT)  2017-18:Q2    600   
2      JAMMU & KASHMIR  LEH LADAKH  Chuglamsar (CT)  2017-18:Q1    600   
3      JAMMU & KASHMIR  LEH LADAKH  Chuglamsar (CT)  2017-18:Q2    600   
4               PUNJAB   GURDASPUR            TIBRI  2017-18:Q1    600   
5               PUNJAB   GURDASPUR            TIBRI  2017-18:Q2    600   
6               PUNJAB   PATHANKOT       Mamun (CT)  2017-18:Q1    600   
7               PUNJAB   PATHANKOT       Mamun (CT)  2017-18:Q2    600   

Items Deposit Offices  
0         500       4  
1         500       3  
2         500       4  
3         500       3  
4         500       4  
5         500       3  
6         500       4  
7         500       3

# Flatten the column names, but reverse the order of the tuples
#   before flattening, and add a character to split on
df.columns = ['~'.join(col[::-1]).strip() for col in df.columns.values]

# Reshape the data, Stata-style
pd.wide_to_long(df, ['Offices', 'Deposit', 'Credit'], 
   i=['State~Center_Details', 'District~Center_Details', 'Center~Center_Details'],
   j='Quarter', sep='~').reset_index()

您仍然需要对“中心详细信息”列进行一些清理。

稍微修改一下@A5C1D2H2I1M1N2O1R2T1的答案，我发现我仍然可以保留多索引结构：

idx = df[['Center_Details']].columns.values.tolist()
long = pd.melt(df, id_vars = idx)

# Renaming variable created by melt to a multiindex friendly name
long.rename(columns={'variable_0': ('Values', 'Qtr')}, inplace=True)

# Reshape to wide format, keeping Values, QTR as a hierarchical column
out = pd.pivot_table(long, index = idx + [('Values', 'Qtr')], columns = 'variable_1', 
                 values = 'value', aggfunc = 'first')

# Creating tuples for new column names
out.columns = [('Values', col) for col in out.columns]
out = out.reset_index()
# Converting columns to multiindex
out.columns = pd.MultiIndex.from_tuples(out.columns.values)
print(out)

+---+-----------------+------------+-----------------+------------+--------+---------+---------+
|   | Center_Details  |            |                 | Values     |        |         |         |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
|   | State           | District   | Center          | QTR        | Credit | Deposit | Offices |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 0 | JAMMU & KASHMIR | KUPWARA    | Drug Mulla (CT) | 2017-18:Q1 | 600    | 500     | 4       |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 1 | JAMMU & KASHMIR | KUPWARA    | Drug Mulla (CT) | 2017-18:Q2 | 600    | 500     | 3       |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 2 | JAMMU & KASHMIR | LEH LADAKH | Chuglamsar (CT) | 2017-18:Q1 | 600    | 500     | 4       |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 3 | JAMMU & KASHMIR | LEH LADAKH | Chuglamsar (CT) | 2017-18:Q2 | 600    | 500     | 3       |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 4 | PUNJAB          | GURDASPUR  | TIBRI           | 2017-18:Q1 | 600    | 500     | 4       |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 5 | PUNJAB          | GURDASPUR  | TIBRI           | 2017-18:Q2 | 600    | 500     | 3       |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 6 | PUNJAB          | PATHANKOT  | Mamun (CT)      | 2017-18:Q1 | 600    | 500     | 4       |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+
| 7 | PUNJAB          | PATHANKOT  | Mamun (CT)      | 2017-18:Q2 | 600    | 500     | 3       |
+---+-----------------+------------+-----------------+------------+--------+---------+---------+

PS：对不起，表格格式太难看了，我仍然不知道如何在上面创建表格。

@mudassirkhan19请花时间阅读。通常，在问题中提供指向文件的链接也是一种不好的做法，尤其是在询问有关pandas dataframes的问题时。在R中，您可能只使用“data.table”包中的

melt

。不过R不支持分层列名。@谢谢您提供了指向指南的链接，在本例中，我提供了指向文件的链接，因为我不知道如何在R中生成示例多索引数据。谢谢！你教了我很多东西！好办法+1.