Python 用多列熔化Df_Python_Pandas

Python 用多列熔化Df

python pandas

Python 用多列熔化Df,python,pandas,Python,Pandas,我有以下几点 ID, 1, 2, 3 #Columns 0,Date, Review, Average, Review # Observations 1,01/01/18 2, 4, 3 # Date and Review Score 2,02/01/18 1, 2, 4 #Date and Review Score 我试图将此DF融合到以下内容中，使用下面的代码让我接近： df=pd.me

我有以下几点

ID,     1,      2,      3              #Columns 
0,Date, Review, Average, Review # Observations
1,01/01/18 2,   4,      3      # Date and Review Score
2,02/01/18 1,   2,      4      #Date and Review Score

我试图将此DF融合到以下内容中，使用下面的代码让我接近：

df=pd.melt（df，id\u vars=['id']，var\u name=['Store']，value\u name='Score'）。fillna（0）。设置索引（'id'）

这一过程包括：

           Store    Score
ID      
Date        
01/01/18    1       Review
01/01/18    1       2
02/01/18    1       1

我想做的是删除“Review”并将其放在自己的专栏中，如下所示

           Store    Review Type Score
ID      
Date        
01/01/18    1,      Review,    1
02/01/18    1,      Review,    2

我尝试过从宽到长，但我认为我需要在这里使用某种程度的多重索引，或者我可能想得太多了

考虑事项：

我的DF是824列和324行

我的变量是按行的，日期的ID是列标题。

如果我了解您要查找的内容

从这个数据框架开始，我相信这就是您所拥有的：

    ID           1         2       3
0   Date       Review   Average   Review
1   01/01/18     2         4       3
2   02/01/18     1         2       4

假设您执行了

pd.melt（）

操作，则剩下：

new_df = pd.melt(df,id_vars=['ID'],var_name=['Store'],value_name='Score').fillna(0).set_index('ID')

          Store    Score
ID      
Date        1      Review
01/01/18    1      2
02/01/18    1      1
Date        2      Average
01/01/18    2      4
02/01/18    2      2
Date        3      Review
01/01/18    3      3
02/01/18    3      4

然后，您可以执行以下操作：

# sort index so all the 'Date' values are at the bottom
new_df.sort_index(inplace=True) 

# create a new df of just the dates becuase that is your review types
review_types = new_df.loc['Date']

# rename column to review types
review_types.rename(columns={'Score':'Review Type'}, inplace=True)

# remove new_df.loc['Date']
# new_df = new_df.drop(new_df.tail(len(review_types)).index).reset_index()

# UPDATED removal of new_df.loc['Date']
# I recommend removing the date values by doing this and not using .tail()
new_df = new_df[~new_df.index.str.contains('Date')].reset_index()

# rename ID column to Date
new_df.rename(columns={'ID':'Date'}, inplace=True)

# merge your two dataframes together
new_df.merge(review_types, on='Store')

这给了你：

    Date      Store  Score  Review Type
0   01/01/18    1     2     Review
1   02/01/18    1     1     Review
2   01/01/18    2     4     Average
3   02/01/18    2     2     Average
4   01/01/18    3     3     Review
5   02/01/18    3     4     Review

非常非常酷，完美地完成了我想要的，拆分DF和合并是一个非常棒的步骤。我想问一下，在对索引进行排序时，

['Date']

列为什么会出现在DF的底部？当删除行的代码转到bottom@Datanovice使用melt后，索引包含两个数字（实际日期：“01/01/18”）和值

“Date”

，因此索引按升序排序：数字将排在字母字符的前面。这就是我使用

new_df.tail（）

删除它们的原因。如果要在顶部添加值

'Date'

，可以使用

head（）将其删除。

instead@Datanovice您还可以通过执行

new_df[~new_df.index.str.contains（'date'）]

来删除日期，这可能是一种比我最初的示例更好的方法。我建议这样做，而不是像我做的那样。嘿@chris，回到这里，我不得不说这是一个非常混乱的数据集的绝妙答案，再次感谢