Python如何透视此表
嗨,我有一张桌子,看起来像这样:Python如何透视此表,python,pandas,Python,Pandas,嗨,我有一张桌子,看起来像这样: df = pd.DataFrame({'CaseNo':[1,1,1,1,2,2,2,2], 'PatientID':[101,101,101,101,102,102,102,102], 'RequestDate':['2020-02-10','2020-02-10','2020-02-11','2020-02-11', '2
df = pd.DataFrame({'CaseNo':[1,1,1,1,2,2,2,2],
'PatientID':[101,101,101,101,102,102,102,102],
'RequestDate':['2020-02-10','2020-02-10','2020-02-11','2020-02-11',
'2020-02-12','2020-02-12','2020-02-13','2020-02-13'],
'CollectionDate':['2020-02-11','2020-02-11','2020-02-12','2020-02-12',
'2020-02-13','2020-02-13','2020-02-14','2020-02-14'],
'TestCode':['ALT','AST','CRE','DB','ALT','AST','CRE','DB'],
'TestResult':[21, 27, 94, 2, 25, 22, 98, 3],
'Units':['U/L','U/L','umol/L','umol/L','U/L','U/L','umol/L','umol/L']})
在python中,如何对其进行透视以获得以下预期输出:
这有点棘手,因为它似乎是面板数据,即时间序列+横截面数据。这就是我要做的:
# pivot
new_df = (df.set_index(['CaseNo', 'PatientID','RequestDate','CollectionDate','TestCode'])
.unstack('TestCode')
)
# fill in the missing `Units`:
new_df['Units'] = (new_df['Units'].groupby(['CaseNo','PatientID']).ffill()
.groupby(['CaseNo','PatientID']).bfill()
)
# rename columns
new_df.columns = [f'{x[1]}_{x[0]}' if x[0]=='Units' else x[1] for x in new_df.columns]
# sort columns and reset index
new_df = new_df.sort_index(axis=1).reset_index()
输出:
CaseNo PatientID RequestDate CollectionDate ALT ALT_Units AST AST_Units CRE CRE_Units DB DB_Units
-- -------- ----------- ------------- ---------------- ----- ----------- ----- ----------- ----- ----------- ---- ----------
0 1 101 2020-02-10 2020-02-11 21 U/L 27 U/L nan umol/L nan umol/L
1 1 101 2020-02-11 2020-02-12 nan U/L nan U/L 94 umol/L 2 umol/L
2 2 102 2020-02-12 2020-02-13 25 U/L 22 U/L nan umol/L nan umol/L
3 2 102 2020-02-13 2020-02-14 nan U/L nan U/L 98 umol/L 3 umol/L
这就是我要做的:
# pivot
new_df = (df.set_index(['CaseNo', 'PatientID','RequestDate','CollectionDate','TestCode'])
.unstack('TestCode')
)
# fill in the missing `Units`:
new_df['Units'] = (new_df['Units'].groupby(['CaseNo','PatientID']).ffill()
.groupby(['CaseNo','PatientID']).bfill()
)
# rename columns
new_df.columns = [f'{x[1]}_{x[0]}' if x[0]=='Units' else x[1] for x in new_df.columns]
# sort columns and reset index
new_df = new_df.sort_index(axis=1).reset_index()
输出:
CaseNo PatientID RequestDate CollectionDate ALT ALT_Units AST AST_Units CRE CRE_Units DB DB_Units
-- -------- ----------- ------------- ---------------- ----- ----------- ----- ----------- ----- ----------- ---- ----------
0 1 101 2020-02-10 2020-02-11 21 U/L 27 U/L nan umol/L nan umol/L
1 1 101 2020-02-11 2020-02-12 nan U/L nan U/L 94 umol/L 2 umol/L
2 2 102 2020-02-12 2020-02-13 25 U/L 22 U/L nan umol/L nan umol/L
3 2 102 2020-02-13 2020-02-14 nan U/L nan U/L 98 umol/L 3 umol/L
这是我尝试使用
groupby()
和apply()
- 步骤1将分组数据转换为数据帧
- 步骤2重新索引
编辑:
pivot()
def pivot(gp_df):
return pd.Series(dict(
( pair for index, row in gp_df.iterrows() for pair in
[ ( row['TestCode'] , row['TestResult'] ) ,
( row['TestCode'] + '_Units' , row['Units'] ) ] )
)).to_frame().transpose()
这是我尝试使用groupby()
和apply()
- 步骤1将分组数据转换为数据帧
- 步骤2重新索引
编辑:pivot()
def pivot(gp_df):
return pd.Series(dict(
( pair for index, row in gp_df.iterrows() for pair in
[ ( row['TestCode'] , row['TestResult'] ) ,
( row['TestCode'] + '_Units' , row['Units'] ) ] )
)).to_frame().transpose()
数据是从SQL数据库读入的吗?如果是,您可以在SQL中进行透视,然后在pandas中读取结果。是的,它是在frm SQL DB中读取的。我如何在MsSQL中透视它?数据是从SQL数据库读取的吗?如果是,您可以在SQL中进行透视,然后在pandas中读取结果。是的,它是在frm SQL DB中读取的。我如何在MsSQL中实现这一点?谢谢,我喜欢你的解决方案。你能解释一下new\u-df.columns=[f'{x[1]}{x[0]}如果x[0]='Units'在new\u-df.columns]
中为x指定了x[1],特别是f'{x[1]}{x[0]}
部分吗?那叫做f-string,你可以查一下。谢谢,我喜欢你的解决方案。你能解释一下new_-df.columns=[f'{x[1]}{x[0]}如果x[0]='Units'在new_-df.columns]
中为x指定了x[1],特别是f'{x[1]}{x[0]}
部分吗?那叫做f-string,你可以查一下。