Python 对列执行计数函数,并将字符串结果附加到列
大家好,我想计算数据集中的行数,并将总计数附加到列名 . 下面是我的数据集的外观Python 对列执行计数函数,并将字符串结果附加到列,python,pandas,dataframe,Python,Pandas,Dataframe,大家好,我想计算数据集中的行数,并将总计数附加到列名 . 下面是我的数据集的外观 import pandas as pd df = pd.DataFrame([('Jorh Hospital','2017-03-15', 389.0,34, 32, 34), ('Jorh Hospital','2018-04-20', np.nan,22, 5, 43), ('Jorh Hospital','2018-05-20', n
import pandas as pd
df = pd.DataFrame([('Jorh Hospital','2017-03-15', 389.0,34, 32, 34),
('Jorh Hospital','2018-04-20', np.nan,22, 5, 43),
('Jorh Hospital','2018-05-20', np.nan,22, 5, 43),
('Bugh Hospital','2019-02-16', 80.5,np.nan, 56, np.nan),
('Bugh Hospital','2019-03-23', np.nan,89, 67, np.nan),
('Bugh Hospital','2019-04-23', np.nan,89, 67, np.nan)],
columns=('Hosp_name','date', 'max_rec', 'reg_pp', 'disch_no', 'temp_rec'))
df
我所尝试的是这一点,我只能在每一个单独的专栏中做。我如何一次完成所有列
df['max_rec'].count()
df['reg_pp'].count()
我不想单独这样做,我如何执行每列计数,并将结果附加到列名,就像这样的最终结果
import pandas as pd
dff = pd.DataFrame([('Jorh Hospital','2017-03-15', 389.0,34, 32, 34),
('orh Hospital','2018-04-20', np.nan,22, 5, 43),
('Jorh Hospital','2018-05-20', np.nan,22, 5, 43),
('Bugh Hospital','2019-02-16', 80.5,np.nan, 56, np.nan),
('Bugeh Hospital','2019-03-23', np.nan,89, 67, np.nan),
('ugh Hospital','2019-04-23', np.nan,89, 67, np.nan)],
columns=('Hosp_name','date', 'max_rec N=2', 'reg_pp N=5', 'disch_no N=6', 'temp_rec N=3'))
dff
期望
dff = pd.DataFrame([('max_rec','50% (1)', '50%(1)'),
('reg_pp','100%(0)', '50%(1)'),
('disch_no','100%(0)', '100%(0)'),
('temp_rec','100%(0)', '0')],
columns=('variables','Jorh Hospital (N=2)', 'Bugh Hospital (N=2)'))
dff
选择所有不包含前2个by and的列,然后创建包含联接值的字典并传递到重命名: 编辑: 编辑1:
df = pd.DataFrame([('Jorh Hospital','2017-03-15', 389.0,34, 32, 34),
('Jorh Hospital','2018-04-20', np.nan,22, 5, 43),
('Jorh Hospital','2018-05-20', np.nan,22, 5, 43),
('Bugh Hospital','2019-02-16', 80.5,np.nan, 56, np.nan),
('Bugh Hospital','2019-03-23', np.nan,89, 67, np.nan),
('Bugh Hospital','2019-04-23', np.nan,89, 67, np.nan)],
columns=('Hosp_name','date', 'max_rec', 'reg_pp', 'disch_no', 'temp_rec'))
print (df)
Hosp_name date max_rec reg_pp disch_no temp_rec
0 Jorh Hospital 2017-03-15 389.0 34.0 32 34.0
1 Jorh Hospital 2018-04-20 NaN 22.0 5 43.0
2 Jorh Hospital 2018-05-20 NaN 22.0 5 43.0
3 Bugh Hospital 2019-02-16 80.5 NaN 56 NaN
4 Bugh Hospital 2019-03-23 NaN 89.0 67 NaN
5 Bugh Hospital 2019-04-23 NaN 89.0 67 NaN
A请将您的代码修改为groupby hospital_name(按医院名称分组),以便医院名称可以列为itI的附加计数I有此列,它可以按医院名称df.iloc[:,2:].notna.groupbydf['Hosp_name']对行进行分组。mean.T,您可以运行它以查看预期的数据帧,但在执行此操作之前,意味着我想先进行计数您编辑的答案正是我想要的,但您能否从索引中删除计数编号,以便将其附加到医院名称中,每家医院的计数我编辑了数据框,使其仅包括两家医院。我在最后添加了一个表格,说明了计数和执行平均值后的预期结果。所以计数是指每家医院的值,然后平均值是以百分比表示的NAN数
s = df.iloc[:, 2:].count()
d = dict(zip(s.index, s.index + ' N=' + s.astype(str)))
df = df.iloc[:,2:].notna().groupby(df['Hosp_name']).mean().T.rename(d)
print (df)
Hosp_name Bugeh Hospital Bugh Hospital Jorh Hospital orh Hospital \
max_rec N=2 0.0 1.0 0.5 0.0
reg_pp N=5 1.0 0.0 1.0 1.0
disch_no N=6 1.0 1.0 1.0 1.0
temp_rec N=3 0.0 0.0 1.0 1.0
Hosp_name ugh Hospital
max_rec N=2 0.0
reg_pp N=5 1.0
disch_no N=6 1.0
temp_rec N=3 0.0
df = pd.DataFrame([('Jorh Hospital','2017-03-15', 389.0,34, 32, 34),
('Jorh Hospital','2018-04-20', np.nan,22, 5, 43),
('Jorh Hospital','2018-05-20', np.nan,22, 5, 43),
('Bugh Hospital','2019-02-16', 80.5,np.nan, 56, np.nan),
('Bugh Hospital','2019-03-23', np.nan,89, 67, np.nan),
('Bugh Hospital','2019-04-23', np.nan,89, 67, np.nan)],
columns=('Hosp_name','date', 'max_rec', 'reg_pp', 'disch_no', 'temp_rec'))
print (df)
Hosp_name date max_rec reg_pp disch_no temp_rec
0 Jorh Hospital 2017-03-15 389.0 34.0 32 34.0
1 Jorh Hospital 2018-04-20 NaN 22.0 5 43.0
2 Jorh Hospital 2018-05-20 NaN 22.0 5 43.0
3 Bugh Hospital 2019-02-16 80.5 NaN 56 NaN
4 Bugh Hospital 2019-03-23 NaN 89.0 67 NaN
5 Bugh Hospital 2019-04-23 NaN 89.0 67 NaN
df = (df.iloc[:,2:]
.notna()
.astype(int)
.groupby(df['Hosp_name'])
.agg(['sum', 'mean'])
.stack(0))
print (df)
mean sum
Hosp_name
Bugh Hospital disch_no 1.000000 3
max_rec 0.333333 1
reg_pp 0.666667 2
temp_rec 0.000000 0
Jorh Hospital disch_no 1.000000 3
max_rec 0.333333 1
reg_pp 1.000000 3
temp_rec 1.000000 3
a = df['mean'].mul(100).round(0).astype(int).astype(str) + '% '
b = '(' + df['sum'].astype(str) + ')'
s = df['sum'].sum(level=0)
d = dict(zip(s.index, s.index + ' N=' + s.astype(str)))
print (d)
{'Bugh Hospital': 'Bugh Hospital N=6', 'Jorh Hospital': 'Jorh Hospital N=10'}
df = a.add(b).unstack(0).rename(columns=d)
print (df)
Hosp_name Bugh Hospital N=6 Jorh Hospital N=10
disch_no 100% (3) 100% (3)
max_rec 33% (1) 33% (1)
reg_pp 67% (2) 100% (3)
temp_rec 0% (0) 100% (3)