Python 大熊猫的混合聚集和分组_Python_Pandas_Pandas Groupby

Python 大熊猫的混合聚集和分组

python pandas

Python 大熊猫的混合聚集和分组,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我拥有的是一个名为“报告”的数据集，其中包含交付驱动因素的详细信息“通过”表示他们按时交货，“失败”表示他们没有按时交货 Name|Outcome A |Pass B |Fail C |Pass D |Pass A |Fail C |Pass 我想要什么 Name|Pass|Fail|Total A |1 |1 |2 B |0 |1 |1 C |2 |0 |2 D |1 |0 |1 我试过： report.groupby

我拥有的是一个名为“报告”的数据集，其中包含交付驱动因素的详细信息“通过”表示他们按时交货，“失败”表示他们没有按时交货

Name|Outcome
A   |Pass
B   |Fail
C   |Pass
D   |Pass
A   |Fail
C   |Pass

我想要什么

Name|Pass|Fail|Total
A   |1   |1   |2
B   |0   |1   |1
C   |2   |0   |2
D   |1   |0   |1

我试过：

report.groupby(['Name','outcome']).agg(['count'])

但它没有给我所需的输出

非常感谢[1]中的

：从io导入StringIO
在[2]中：df_string=''Name |结果^M
…：A | Pass ^M
…：B |失败^M
…：C |通过^M
…：D |通过^M
…：A |失败^M
…：C |通过''
在[3]中：report=pd.read_csv（StringIO（df_string），sep='|'）
[4]中：report.assign（count=1）.groupby（[“Name”，“output”]）[“count”].sum（）.unstack（）.assign（Total=lambda df:df.sum（axis=1））
出[4]：
结果不合格总数
名称
A 1.01.02.0
B 1.0 NaN 1.0
C NaN 2.0 2.0
D NaN 1.0 1.0

现在，您可以使用

fillna（0）

方法填充NAs值

这是

pd。交叉表sum
超过axis=1
：
df = pd.crosstab(df['Name'], df['Outcome'])
df['Total'] = df[['Fail', 'Pass']].sum(axis=1)


或者要删除列轴名称，我们使用重命名\u轴
：
df = pd.crosstab(df['Name'], df['Outcome']).reset_index().rename_axis(None, axis='columns')
df['Total'] = df[['Fail', 'Pass']].sum(axis=1)

与margins=True
和margins\u name
参数一起使用：
print (pd.crosstab(df['Name'], df['Outcome'], margins=True, margins_name='Total'))
Outcome  Fail  Pass  Total
Name                      
A           1     1      2
B           1     0      1
C           0     2      2
D           0     1      1
Total       2     4      6

然后通过以下方式删除最后一行的位置：
使用pandas.dummies
和groupby
的一种方法：
report = pd.get_dummies(df1, columns=['outcome']).groupby(['name'], as_index=False).sum().rename(columns={"outcome_Fail":"Fail", "outcome_Pass":"Pass"})

report["Total"] = report["Pass"] + report["Fail"]

print(report)

输出：
    name Fail Pass Total
0   A     1    1    2
1   B     1    0    1
2   C     0    2    2
3   D     0    1    1

你的输出是错误的。为什么B
在通过时有1
和0
在失败时有0
感谢您指出..更正我通常使用margins=True
和iloc[：-1]
链接以删除行级别，因为您不能在页边空白处指定轴argument@Datanovice-没错，补充回答。谢谢！正是我想要的。从不知道熊猫中的交叉表功能。非常感谢@Erfan！谢谢你@jezrael。伟大的优化版本crosstab@NithinNampoothiry-yop，不幸的是，它还添加了最后一个Total列，因此需要删除它。
df = pd.crosstab(df['Name'], df['Outcome'], margins=True, margins_name='Total').iloc[:-1]
print (df)
Outcome  Fail  Pass  Total
Name                      
A           1     1      2
B           1     0      1
C           0     2      2
D           0     1      1

report = pd.get_dummies(df1, columns=['outcome']).groupby(['name'], as_index=False).sum().rename(columns={"outcome_Fail":"Fail", "outcome_Pass":"Pass"})

report["Total"] = report["Pass"] + report["Fail"]

print(report)

    name Fail Pass Total
0   A     1    1    2
1   B     1    0    1
2   C     0    2    2
3   D     0    1    1