Python如何透视此表_Python_Pandas

Python如何透视此表

python pandas

Python如何透视此表,python,pandas,Python,Pandas,嗨，我有一张桌子，看起来像这样： df = pd.DataFrame({'CaseNo':[1,1,1,1,2,2,2,2], 'PatientID':[101,101,101,101,102,102,102,102], 'RequestDate':['2020-02-10','2020-02-10','2020-02-11','2020-02-11', '2

嗨，我有一张桌子，看起来像这样：

df = pd.DataFrame({'CaseNo':[1,1,1,1,2,2,2,2],
                  'PatientID':[101,101,101,101,102,102,102,102],
                  'RequestDate':['2020-02-10','2020-02-10','2020-02-11','2020-02-11',
                                 '2020-02-12','2020-02-12','2020-02-13','2020-02-13'],
                  'CollectionDate':['2020-02-11','2020-02-11','2020-02-12','2020-02-12',
                                    '2020-02-13','2020-02-13','2020-02-14','2020-02-14'],
                  'TestCode':['ALT','AST','CRE','DB','ALT','AST','CRE','DB'],
                  'TestResult':[21, 27, 94, 2, 25, 22, 98, 3],
                  'Units':['U/L','U/L','umol/L','umol/L','U/L','U/L','umol/L','umol/L']})

在python中，如何对其进行透视以获得以下预期输出：

这有点棘手，因为它似乎是面板数据，即时间序列+横截面数据。

这就是我要做的：

# pivot
new_df = (df.set_index(['CaseNo', 'PatientID','RequestDate','CollectionDate','TestCode'])
   .unstack('TestCode')

)

# fill in the missing `Units`:
new_df['Units'] = (new_df['Units'].groupby(['CaseNo','PatientID']).ffill()
                         .groupby(['CaseNo','PatientID']).bfill()
                  )

# rename columns
new_df.columns = [f'{x[1]}_{x[0]}' if x[0]=='Units' else x[1] for x in new_df.columns]

# sort columns and reset index
new_df = new_df.sort_index(axis=1).reset_index()

输出：

      CaseNo    PatientID  RequestDate    CollectionDate      ALT  ALT_Units      AST  AST_Units      CRE  CRE_Units      DB  DB_Units
--  --------  -----------  -------------  ----------------  -----  -----------  -----  -----------  -----  -----------  ----  ----------
 0         1          101  2020-02-10     2020-02-11           21  U/L             27  U/L            nan  umol/L        nan  umol/L
 1         1          101  2020-02-11     2020-02-12          nan  U/L            nan  U/L             94  umol/L          2  umol/L
 2         2          102  2020-02-12     2020-02-13           25  U/L             22  U/L            nan  umol/L        nan  umol/L
 3         2          102  2020-02-13     2020-02-14          nan  U/L            nan  U/L             98  umol/L          3  umol/L

这就是我要做的：

# pivot
new_df = (df.set_index(['CaseNo', 'PatientID','RequestDate','CollectionDate','TestCode'])
   .unstack('TestCode')

)

# fill in the missing `Units`:
new_df['Units'] = (new_df['Units'].groupby(['CaseNo','PatientID']).ffill()
                         .groupby(['CaseNo','PatientID']).bfill()
                  )

# rename columns
new_df.columns = [f'{x[1]}_{x[0]}' if x[0]=='Units' else x[1] for x in new_df.columns]

# sort columns and reset index
new_df = new_df.sort_index(axis=1).reset_index()

输出：

      CaseNo    PatientID  RequestDate    CollectionDate      ALT  ALT_Units      AST  AST_Units      CRE  CRE_Units      DB  DB_Units
--  --------  -----------  -------------  ----------------  -----  -----------  -----  -----------  -----  -----------  ----  ----------
 0         1          101  2020-02-10     2020-02-11           21  U/L             27  U/L            nan  umol/L        nan  umol/L
 1         1          101  2020-02-11     2020-02-12          nan  U/L            nan  U/L             94  umol/L          2  umol/L
 2         2          102  2020-02-12     2020-02-13           25  U/L             22  U/L            nan  umol/L        nan  umol/L
 3         2          102  2020-02-13     2020-02-14          nan  U/L            nan  U/L             98  umol/L          3  umol/L

这是我尝试使用

groupby（）

和

apply（）

步骤1将分组数据转换为数据帧
步骤2重新索引

编辑：

pivot（）
def pivot(gp_df):
    return pd.Series(dict(
      ( pair for index, row in gp_df.iterrows() for pair in
        [ ( row['TestCode'] ,  row['TestResult'] ) , 
          ( row['TestCode'] + '_Units' ,  row['Units'] ) ] )
    )).to_frame().transpose()

这是我尝试使用groupby（）
和apply（）


步骤1将分组数据转换为数据帧
步骤2重新索引



编辑：pivot（）
def pivot(gp_df):
    return pd.Series(dict(
      ( pair for index, row in gp_df.iterrows() for pair in
        [ ( row['TestCode'] ,  row['TestResult'] ) , 
          ( row['TestCode'] + '_Units' ,  row['Units'] ) ] )
    )).to_frame().transpose()

数据是从SQL数据库读入的吗？如果是，您可以在SQL中进行透视，然后在pandas中读取结果。是的，它是在frm SQL DB中读取的。我如何在MsSQL中透视它？数据是从SQL数据库读取的吗？如果是，您可以在SQL中进行透视，然后在pandas中读取结果。是的，它是在frm SQL DB中读取的。我如何在MsSQL中实现这一点？谢谢，我喜欢你的解决方案。你能解释一下new\u-df.columns=[f'{x[1]}{x[0]}如果x[0]='Units'在new\u-df.columns]
中为x指定了x[1]，特别是f'{x[1]}{x[0]}
部分吗？那叫做f-string，你可以查一下。谢谢，我喜欢你的解决方案。你能解释一下new_-df.columns=[f'{x[1]}{x[0]}如果x[0]='Units'在new_-df.columns]
中为x指定了x[1]，特别是f'{x[1]}{x[0]}
部分吗？那叫做f-string，你可以查一下。