Python如何透视此表

Python如何透视此表,python,pandas,Python,Pandas,嗨,我有一张桌子,看起来像这样: df = pd.DataFrame({'CaseNo':[1,1,1,1,2,2,2,2], 'PatientID':[101,101,101,101,102,102,102,102], 'RequestDate':['2020-02-10','2020-02-10','2020-02-11','2020-02-11', '2

嗨,我有一张桌子,看起来像这样:

df = pd.DataFrame({'CaseNo':[1,1,1,1,2,2,2,2],
                  'PatientID':[101,101,101,101,102,102,102,102],
                  'RequestDate':['2020-02-10','2020-02-10','2020-02-11','2020-02-11',
                                 '2020-02-12','2020-02-12','2020-02-13','2020-02-13'],
                  'CollectionDate':['2020-02-11','2020-02-11','2020-02-12','2020-02-12',
                                    '2020-02-13','2020-02-13','2020-02-14','2020-02-14'],
                  'TestCode':['ALT','AST','CRE','DB','ALT','AST','CRE','DB'],
                  'TestResult':[21, 27, 94, 2, 25, 22, 98, 3],
                  'Units':['U/L','U/L','umol/L','umol/L','U/L','U/L','umol/L','umol/L']})

在python中,如何对其进行透视以获得以下预期输出:


这有点棘手,因为它似乎是面板数据,即时间序列+横截面数据。

这就是我要做的:

# pivot
new_df = (df.set_index(['CaseNo', 'PatientID','RequestDate','CollectionDate','TestCode'])
   .unstack('TestCode')

)

# fill in the missing `Units`:
new_df['Units'] = (new_df['Units'].groupby(['CaseNo','PatientID']).ffill()
                         .groupby(['CaseNo','PatientID']).bfill()
                  )

# rename columns
new_df.columns = [f'{x[1]}_{x[0]}' if x[0]=='Units' else x[1] for x in new_df.columns]

# sort columns and reset index
new_df = new_df.sort_index(axis=1).reset_index()
输出:

      CaseNo    PatientID  RequestDate    CollectionDate      ALT  ALT_Units      AST  AST_Units      CRE  CRE_Units      DB  DB_Units
--  --------  -----------  -------------  ----------------  -----  -----------  -----  -----------  -----  -----------  ----  ----------
 0         1          101  2020-02-10     2020-02-11           21  U/L             27  U/L            nan  umol/L        nan  umol/L
 1         1          101  2020-02-11     2020-02-12          nan  U/L            nan  U/L             94  umol/L          2  umol/L
 2         2          102  2020-02-12     2020-02-13           25  U/L             22  U/L            nan  umol/L        nan  umol/L
 3         2          102  2020-02-13     2020-02-14          nan  U/L            nan  U/L             98  umol/L          3  umol/L

这就是我要做的:

# pivot
new_df = (df.set_index(['CaseNo', 'PatientID','RequestDate','CollectionDate','TestCode'])
   .unstack('TestCode')

)

# fill in the missing `Units`:
new_df['Units'] = (new_df['Units'].groupby(['CaseNo','PatientID']).ffill()
                         .groupby(['CaseNo','PatientID']).bfill()
                  )

# rename columns
new_df.columns = [f'{x[1]}_{x[0]}' if x[0]=='Units' else x[1] for x in new_df.columns]

# sort columns and reset index
new_df = new_df.sort_index(axis=1).reset_index()
输出:

      CaseNo    PatientID  RequestDate    CollectionDate      ALT  ALT_Units      AST  AST_Units      CRE  CRE_Units      DB  DB_Units
--  --------  -----------  -------------  ----------------  -----  -----------  -----  -----------  -----  -----------  ----  ----------
 0         1          101  2020-02-10     2020-02-11           21  U/L             27  U/L            nan  umol/L        nan  umol/L
 1         1          101  2020-02-11     2020-02-12          nan  U/L            nan  U/L             94  umol/L          2  umol/L
 2         2          102  2020-02-12     2020-02-13           25  U/L             22  U/L            nan  umol/L        nan  umol/L
 3         2          102  2020-02-13     2020-02-14          nan  U/L            nan  U/L             98  umol/L          3  umol/L

这是我尝试使用
groupby()
apply()

  • 步骤1将分组数据转换为数据帧
  • 步骤2重新索引


编辑:
pivot()

def pivot(gp_df):
    return pd.Series(dict(
      ( pair for index, row in gp_df.iterrows() for pair in
        [ ( row['TestCode'] ,  row['TestResult'] ) , 
          ( row['TestCode'] + '_Units' ,  row['Units'] ) ] )
    )).to_frame().transpose()

这是我尝试使用
groupby()
apply()

  • 步骤1将分组数据转换为数据帧
  • 步骤2重新索引


编辑:
pivot()

def pivot(gp_df):
    return pd.Series(dict(
      ( pair for index, row in gp_df.iterrows() for pair in
        [ ( row['TestCode'] ,  row['TestResult'] ) , 
          ( row['TestCode'] + '_Units' ,  row['Units'] ) ] )
    )).to_frame().transpose()

数据是从SQL数据库读入的吗?如果是,您可以在SQL中进行透视,然后在pandas中读取结果。是的,它是在frm SQL DB中读取的。我如何在MsSQL中透视它?数据是从SQL数据库读取的吗?如果是,您可以在SQL中进行透视,然后在pandas中读取结果。是的,它是在frm SQL DB中读取的。我如何在MsSQL中实现这一点?谢谢,我喜欢你的解决方案。你能解释一下
new\u-df.columns=[f'{x[1]}{x[0]}如果x[0]='Units'在new\u-df.columns]
中为x指定了x[1],特别是
f'{x[1]}{x[0]}
部分吗?那叫做f-string,你可以查一下。谢谢,我喜欢你的解决方案。你能解释一下
new_-df.columns=[f'{x[1]}{x[0]}如果x[0]='Units'在new_-df.columns]
中为x指定了x[1],特别是
f'{x[1]}{x[0]}
部分吗?那叫做f-string,你可以查一下。