Python 数据帧操作和聚合

Python 数据帧操作和聚合,python,pandas,Python,Pandas,我有以下数据帧 City Status q1 q2 Record 0 Austin Standard N Y Active 1 Dallas Standard N y Active 2 Orlando Standard N N Active 3 Orlando Ex Y Y Inactive 4 Orlando Standard N N Active

我有以下数据帧

    City       Status     q1  q2 Record
0   Austin     Standard   N   Y  Active
1   Dallas     Standard   N   y  Active
2   Orlando    Standard   N   N  Active 
3   Orlando    Ex         Y   Y  Inactive
4   Orlando    Standard   N   N  Active
我试图操纵它,使其看起来像这样:

                Count  %
All Cities      5      100.0%
Active          4      80%
  Ex            1      20%
  Standard      4      80%
  Q1 = Y        1      20%
  Q2 = Y        2      40%
Inactive        1      20%
我使用了一大块代码,通过将每个df列分解为其组件状态(例如,q1yes的列、q1no的列等)来计算每个百分比,然后递归地填充数据帧,但我觉得我肯定遗漏了什么


我还需要按城市对其进行细分,但在寻求更多帮助之前,我想先弄清楚这一部分

您可以这样做:

In [159]: df.q1 = 'Q1 = ' + df.q1.str.upper()

In [160]: df.q2 = 'Q2 = ' + df.q2.str.upper()

In [161]: df
Out[161]:
      City    Status      q1      q2    Record
0   Austin  Standard  Q1 = N  Q2 = Y    Active
1   Dallas  Standard  Q1 = N  Q2 = Y    Active
2  Orlando  Standard  Q1 = N  Q2 = N    Active
3  Orlando        Ex  Q1 = Y  Q2 = Y  Inactive
4  Orlando  Standard  Q1 = N  Q2 = N    Active

In [173]: r = (df.drop('City',1)
   .....:        .apply(lambda x: x.value_counts())
   .....:        .apply(lambda x: x[x.first_valid_index()], axis=1)
   .....:        .to_frame('Count')
   .....:        .astype(np.int16)
   .....:     )

In [174]: r['pct'] = (r.Count / len(df) * 100).astype(str) + '%'

In [175]: r.loc['All Cities'] = [len(df), '100.0%']

In [176]: r
Out[176]:
            Count     pct
Active          4   80.0%
Ex              1   20.0%
Inactive        1   20.0%
Q1 = N          4   80.0%
Q1 = Y          1   20.0%
Q2 = N          2   40.0%
Q2 = Y          3   60.0%
Standard        4   80.0%
All Cities      5  100.0%
最后:

In [178]: r[~r.index.str.contains('= N')]
Out[178]:
            Count     pct
Active          4   80.0%
Ex              1   20.0%
Inactive        1   20.0%
Q1 = Y          1   20.0%
Q2 = Y          3   60.0%
Standard        4   80.0%
All Cities      5  100.0%