Pandas 熊猫多柱熔化

Pandas 熊猫多柱熔化,pandas,Pandas,我一直在尝试将pd.melt与数据帧一起使用,如下所示 MRN Name Dt1 Nam1 Loc1 Dt2 Nam2 Loc2 Dt3 Nam3 Loc3 0 1234 John 2010-01-01 CMV Eye 2010-02-10 RSV Res 2010-03-10 HSV Eye 1 1245 Joe 2011-06-10 Cdiff GI NaT NaN NaN

我一直在尝试将pd.melt与数据帧一起使用,如下所示

    MRN  Name        Dt1   Nam1 Loc1        Dt2 Nam2 Loc2        Dt3 Nam3 Loc3
0  1234  John 2010-01-01    CMV  Eye 2010-02-10  RSV  Res 2010-03-10  HSV  Eye
1  1245   Joe 2011-06-10  Cdiff   GI        NaT  NaN  NaN        NaT  NaN  NaN
2  1235  Mary 2012-05-06  Ecoli  Bld        NaT  NaN  NaN        NaT  NaN  NaN
3  1254  Matt        NaT    NaN  NaN        NaT  NaN  NaN        NaT  NaN  NaN
获得如下输出

    MRN  Name         Dt    Nam  Loc
0  1234  John 2010-01-01    CMV  Eye
1  1234  John 2010-02-10    RSV  Res
2  1234  John 2010-03-10    HSV  Eye
3  1245   Joe 2011-06-10  Cdiff   GI
4  1235  Mary 2012-05-06  Ecoli  Bld
5  1254  Matt        NaT    NaN  NaN

我无法做到这一点。

您可以在不使用pd.melt的情况下完成此操作,方法是准备每组列,然后使用
pd.concat
连接它们:

dfs = []
for i in range(1, 4):
    tmp_df = df[["MRN", "Name", f"Dt{i}", f"Nam{i}", f"Loc{i}"]]
    tmp_df = df.rename(columns={f"Dt{i}": "Dt", f"Name{i}": "Nam", f"Loc{i}": "Loc"})
    dfs.append(tmp_df.dropna())  # dropna to remove rows with NaN.

df = pd.concat(dfs)
或者,如果您希望它作为一个非常长的一行:

df = pd.concat([df[["MRN", "Name", f"Dt{i}", f"Nam{i}", f"Loc{i}"]].rename(columns={f"Dt{i}": "Dt", f"Name{i}": "Nam", f"Loc{i}": "Loc"}).dropna() for i in range(1, 4)])

您可能需要对过滤进行硬编码,以匹配您的预期输出:

(
    pd.wide_to_long(df, stubnames=["Dt", "Nam", "Loc"], i=["MRN", "Name"], j="num")
    .reset_index()
    .sort_values(["Dt", "num"])
    .drop('num', 1)
    .loc[:9]
)


     MRN    Name        Dt           Nam    Loc
0   1234    John        2010-01-01  CMV     Eye
1   1234    John        2010-02-10  RSV     Res
2   1234    John        2010-03-10  HSV     Eye
3   1245    Joe         2011-06-10  Cdiff   GI
6   1235    Mary        2012-05-06  Ecoli   Bld
9   1254    Matt        NaN         NaN     NaN