Python 将不同的列值堆叠到数据帧中的一列中_Python_Pandas_Dataframe_Multiple Columns_Melt

Python 将不同的列值堆叠到数据帧中的一列中

python pandas dataframe

Python 将不同的列值堆叠到数据帧中的一列中,python,pandas,dataframe,multiple-columns,melt,Python,Pandas,Dataframe,Multiple Columns,Melt,我有以下数据帧- df = pd.DataFrame({ 'ID': [1, 2, 2, 3, 3, 3, 4], 'Prior': ['a', 'b', 'c', 'd', 'e', 'f', 'g'], 'Current': ['a1', 'c', 'c1', 'e', 'f', 'f1', 'g1'], 'Date': ['1/1/2019', '5/1/2019', '10/2/2019', '15/3/2019', '6/5/2019',

我有以下数据帧-

df = pd.DataFrame({
    'ID': [1, 2, 2, 3, 3, 3, 4],
    'Prior': ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
    'Current': ['a1', 'c', 'c1', 'e', 'f', 'f1', 'g1'],
    'Date': ['1/1/2019', '5/1/2019', '10/2/2019', '15/3/2019', '6/5/2019',
             '7/9/2019', '16/11/2019']
})

这是我想要的输出-

desired_df = pd.DataFrame({
    'ID': [1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
    'Prior_Current': ['a', 'a1', 'b', 'c', 'c1', 'd', 'e', 'f', 'f1', 'g',
                      'g1'],
    'Start_Date': ['', '1/1/2019', '', '5/1/2019', '10/2/2019', '', '15/3/2019',
                   '6/5/2019', '7/9/2019', '', '16/11/2019'],
    'End_Date': ['1/1/2019', '', '5/1/2019', '10/2/2019', '', '15/3/2019',
                 '6/5/2019', '7/9/2019', '', '16/11/2019', '']
})

我尝试了以下方法-

keys = ['Prior', 'Current']
df2 = (
    pd.melt(df, id_vars='ID', value_vars=keys, value_name='Prior_Current')
        .merge(df[['ID', 'Date']], how='left', on='ID')
)
df2['Start_Date'] = np.where(df2['variable'] == 'Prior', df2['Date'], '')
df2['End_Date'] = np.where(df2['variable'] == 'Current', df2['Date'], '')
df2.sort_values(['ID'], ascending=True, inplace=True)

但这似乎不起作用。请提供帮助。

您可以使用

堆栈

和

透视表

：输出：解释：在

堆栈

重置之后，索引

df将如下所示：

   ID        Date  level_2   0
0    1    1/1/2019    Prior   a
1    1    1/1/2019  Current  a1
2    2    5/1/2019    Prior   b
3    2    5/1/2019  Current   c
4    2   10/2/2019    Prior   c
5    2   10/2/2019  Current  c1
6    3   15/3/2019    Prior   d
7    3   15/3/2019  Current   e
8    3    6/5/2019    Prior   e
9    3    6/5/2019  Current   f
10   3    7/9/2019    Prior   f
11   3    7/9/2019  Current  f1
12   4  16/11/2019    Prior   g
13   4  16/11/2019  Current  g1

现在，我们可以使用

ID

和

column 0

作为索引/

level_2

作为列/

Date

列作为值

最后，我们需要重命名列以获得所需的结果。

我的方法是逐步构建并实现目标df。第一步是使用

melt（）

和

merge（）

对代码进行扩展。合并基于“当前”和“之前”列完成，以获取开始和结束日期

df = pd.DataFrame({
    'ID': [1, 2, 2, 3, 3, 3, 4],
    'Prior': ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
    'Current': ['a1', 'c', 'c1', 'e', 'f', 'f1', 'g1'],
    'Date': ['1/1/2019', '5/1/2019', '10/2/2019', '15/3/2019', '6/5/2019',
             '7/9/2019', '16/11/2019']
})
df2 = pd.melt(df, id_vars='ID', value_vars=['Prior', 'Current'], value_name='Prior_Current').drop('variable',1).drop_duplicates().sort_values('ID')
df2 = df2.merge(df[['Current', 'Date']], how='left', left_on='Prior_Current', right_on='Current').drop('Current',1)
df2 = df2.merge(df[['Prior', 'Date']], how='left', left_on='Prior_Current', right_on='Prior').drop('Prior',1)
df2 = df2.fillna('').reset_index(drop=True)
df2.columns = ['ID', 'Prior_Current', 'Start_Date', 'End_Date']

另一种方法是定义一个自定义函数来获取日期，然后使用

lambda

函数：

def get_date(x, col):
    try:
        return df['Date'][df[col]==x].values[0]
    except:
        return ''

df2 = pd.melt(df, id_vars='ID', value_vars=['Prior', 'Current'], value_name='Prior_Current').drop('variable',1).drop_duplicates().sort_values('ID').reset_index(drop=True)
df2['Start_Date'] = df2['Prior_Current'].apply(lambda x: get_date(x, 'Current'))
df2['End_Date'] = df2['Prior_Current'].apply(lambda x: get_date(x, 'Prior'))

输出

对于代码格式，请使用`而不是“我已经检查并修复了格式设置。请仔细检查我是否无意中做了任何非装饰性的更改。请帮助解决此问题。哇，这很好！：-）@黑乌鸦谢谢！！！！。。

df = pd.DataFrame({
    'ID': [1, 2, 2, 3, 3, 3, 4],
    'Prior': ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
    'Current': ['a1', 'c', 'c1', 'e', 'f', 'f1', 'g1'],
    'Date': ['1/1/2019', '5/1/2019', '10/2/2019', '15/3/2019', '6/5/2019',
             '7/9/2019', '16/11/2019']
})
df2 = pd.melt(df, id_vars='ID', value_vars=['Prior', 'Current'], value_name='Prior_Current').drop('variable',1).drop_duplicates().sort_values('ID')
df2 = df2.merge(df[['Current', 'Date']], how='left', left_on='Prior_Current', right_on='Current').drop('Current',1)
df2 = df2.merge(df[['Prior', 'Date']], how='left', left_on='Prior_Current', right_on='Prior').drop('Prior',1)
df2 = df2.fillna('').reset_index(drop=True)
df2.columns = ['ID', 'Prior_Current', 'Start_Date', 'End_Date']

def get_date(x, col):
    try:
        return df['Date'][df[col]==x].values[0]
    except:
        return ''

df2 = pd.melt(df, id_vars='ID', value_vars=['Prior', 'Current'], value_name='Prior_Current').drop('variable',1).drop_duplicates().sort_values('ID').reset_index(drop=True)
df2['Start_Date'] = df2['Prior_Current'].apply(lambda x: get_date(x, 'Current'))
df2['End_Date'] = df2['Prior_Current'].apply(lambda x: get_date(x, 'Prior'))