Python 数据帧操作和聚合
我有以下数据帧Python 数据帧操作和聚合,python,pandas,Python,Pandas,我有以下数据帧 City Status q1 q2 Record 0 Austin Standard N Y Active 1 Dallas Standard N y Active 2 Orlando Standard N N Active 3 Orlando Ex Y Y Inactive 4 Orlando Standard N N Active
City Status q1 q2 Record
0 Austin Standard N Y Active
1 Dallas Standard N y Active
2 Orlando Standard N N Active
3 Orlando Ex Y Y Inactive
4 Orlando Standard N N Active
我试图操纵它,使其看起来像这样:
Count %
All Cities 5 100.0%
Active 4 80%
Ex 1 20%
Standard 4 80%
Q1 = Y 1 20%
Q2 = Y 2 40%
Inactive 1 20%
我使用了一大块代码,通过将每个df列分解为其组件状态(例如,q1yes的列、q1no的列等)来计算每个百分比,然后递归地填充数据帧,但我觉得我肯定遗漏了什么
我还需要按城市对其进行细分,但在寻求更多帮助之前,我想先弄清楚这一部分您可以这样做:
In [159]: df.q1 = 'Q1 = ' + df.q1.str.upper()
In [160]: df.q2 = 'Q2 = ' + df.q2.str.upper()
In [161]: df
Out[161]:
City Status q1 q2 Record
0 Austin Standard Q1 = N Q2 = Y Active
1 Dallas Standard Q1 = N Q2 = Y Active
2 Orlando Standard Q1 = N Q2 = N Active
3 Orlando Ex Q1 = Y Q2 = Y Inactive
4 Orlando Standard Q1 = N Q2 = N Active
In [173]: r = (df.drop('City',1)
.....: .apply(lambda x: x.value_counts())
.....: .apply(lambda x: x[x.first_valid_index()], axis=1)
.....: .to_frame('Count')
.....: .astype(np.int16)
.....: )
In [174]: r['pct'] = (r.Count / len(df) * 100).astype(str) + '%'
In [175]: r.loc['All Cities'] = [len(df), '100.0%']
In [176]: r
Out[176]:
Count pct
Active 4 80.0%
Ex 1 20.0%
Inactive 1 20.0%
Q1 = N 4 80.0%
Q1 = Y 1 20.0%
Q2 = N 2 40.0%
Q2 = Y 3 60.0%
Standard 4 80.0%
All Cities 5 100.0%
最后:
In [178]: r[~r.index.str.contains('= N')]
Out[178]:
Count pct
Active 4 80.0%
Ex 1 20.0%
Inactive 1 20.0%
Q1 = Y 1 20.0%
Q2 = Y 3 60.0%
Standard 4 80.0%
All Cities 5 100.0%