python透视表中的总值_Python_Pandas_Pivot Table_Pandas Groupby_Pandasql

python透视表中的总值

python pandas

python透视表中的总值,python,pandas,pivot-table,pandas-groupby,pandasql,Python,Pandas,Pivot Table,Pandas Groupby,Pandasql,我的原始数据帧与下面的数据帧类似： df= pd.DataFrame({'Variation' : ['A']*5 + ['B']*3 + ['A']*4, 'id': [11]*4 + [12] + [15]*2 + [17] + [20]*4, 'steps' : ['start','step1','step2','end','end','step1','step2','step1','start','step1','s

我的原始数据帧与下面的数据帧类似：

df= pd.DataFrame({'Variation' : ['A']*5 + ['B']*3 + ['A']*4, 
                  'id': [11]*4 + [12] + [15]*2 + [17] + [20]*4,
                 'steps' : ['start','step1','step2','end','end','step1','step2','step1','start','step1','step2','end']})

我想从这个数据帧创建一个透视表，我使用了下面提到的代码：

df1=df.pivot_table(index=['Variation'], columns=['steps'], 
                          values='id', aggfunc='count', fill_value=0)

然而，我还想看看id的总不同计数。有人能告诉我如何做到这一点吗？我的预期输出应该是：

| Variation | Total id | Total start | Total step1 | Total step2 | Total end |
|-----------|----------|-------------|-------------|-------------|-----------|
| A         | 3        | 2           | 2           | 2           | 3         |
| B         | 2        | 0           | 2           | 1           | 0         |

使用：

如果需要

变体后的列

：

c = ['id'] + df['steps'].unique().tolist()
df1 = (df1.join(df.groupby('Variation')['id'].nunique())
          .reindex(columns=c)
          .add_prefix('Total ')
          .reset_index()
          .rename_axis(None, axis=1))

print(df1)
  Variation  Total id  Total start  Total step1  Total step2  Total end
0         A         3            2            2            2          3
1         B         2            0            2            1          0

使用：

如果需要

变体后的列

：

c = ['id'] + df['steps'].unique().tolist()
df1 = (df1.join(df.groupby('Variation')['id'].nunique())
          .reindex(columns=c)
          .add_prefix('Total ')
          .reset_index()
          .rename_axis(None, axis=1))

print(df1)
  Variation  Total id  Total start  Total step1  Total step2  Total end
0         A         3            2            2            2          3
1         B         2            0            2            1          0

如何从数据透视表中筛选值？比方说，我想过滤变量A的“Total end”值，并将其存储在一个新变量中。因此，我的预期结果将是

new_var=3

@hk2-最简单的是从第二个解决方案中删除

。重置_index（）。重命名_轴（无，轴=1）

，然后使用

new_var=df1.loc['Variation A'，'Total end']

如何从透视表中筛选值？比方说，我想过滤变量A的“Total end”值，并将其存储在一个新变量中。因此，我的预期结果将是

new_var=3

@hk2-最简单的是从第二个解决方案中删除

。重置索引（）。重命名_轴（无，轴=1）

，然后使用

new_var=df1.loc['Variation A'，'Total end']