Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/17.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x 熊猫分组并计算两列的比率_Python 3.x_Pandas_Pandas Groupby - Fatal编程技术网

Python 3.x 熊猫分组并计算两列的比率

Python 3.x 熊猫分组并计算两列的比率,python-3.x,pandas,pandas-groupby,Python 3.x,Pandas,Pandas Groupby,我尝试使用Pandas和groupby来计算两列的比率。在下面的示例中,我想计算每个部门的员工人数比例(部门中的员工人数/每个部门的员工总数)。例如,销售部门共有3名员工,拥有员工身份的员工人数为2人,这一比例为2/3,为66.67%。我设法通过黑客我的方式得到这个,但必须有一个更优雅和简单的方法来做到这一点。如何才能更有效地获得下面所需的输出 原始数据帧: Department Name Status 0 Sales John Employee 1

我尝试使用Pandas和groupby来计算两列的比率。在下面的示例中,我想计算每个部门的员工人数比例(部门中的员工人数/每个部门的员工总数)。例如,销售部门共有3名员工,拥有员工身份的员工人数为2人,这一比例为2/3,为66.67%。我设法通过黑客我的方式得到这个,但必须有一个更优雅和简单的方法来做到这一点。如何才能更有效地获得下面所需的输出

原始数据帧:

  Department    Name      Status
0      Sales    John    Employee
1      Sales   Steve    Employee
2      Sales    Sara  Contractor
3    Finance   Allen  Contractor
4  Marketing  Robert    Employee
5  Marketing    Lacy  Contractor
mydict ={
        'Name': ['John', 'Steve', 'Sara', 'Allen', 'Robert', 'Lacy'],
        'Department': ['Sales', 'Sales', 'Sales', 'Finance', 'Marketing', 'Marketing'],
        'Status': ['Employee', 'Employee', 'Contractor', 'Contractor', 'Employee', 'Contractor']
    }

df = pd.DataFrame(mydict)

# Create column with total number of staff Status per Department
df['total_dept'] = df.groupby(['Department'])['Name'].transform('count')
print(df)
print('\n')


# Crate column with Status ratio per department
for k, v, in df.iterrows():
    df.loc[k, 'Status_Ratio'] = (df.groupby(['Department', 'Status']).count().xs(v['Status'], level=1)['total_dept'][v['Department']]/v['total_dept']) *100
print(df)
print('\n')

# Final Groupby with Status Ratio. Size NOT needed
print(df.groupby(['Department', 'Status', 'Status_Ratio']).size())
Department  Status      Status_Ratio
Finance     Contractor  100.00
Marketing   Contractor  50.00
            Employee    50.00
Sales       Contractor  33.33 
            Employee    66.67
代码:

  Department    Name      Status
0      Sales    John    Employee
1      Sales   Steve    Employee
2      Sales    Sara  Contractor
3    Finance   Allen  Contractor
4  Marketing  Robert    Employee
5  Marketing    Lacy  Contractor
mydict ={
        'Name': ['John', 'Steve', 'Sara', 'Allen', 'Robert', 'Lacy'],
        'Department': ['Sales', 'Sales', 'Sales', 'Finance', 'Marketing', 'Marketing'],
        'Status': ['Employee', 'Employee', 'Contractor', 'Contractor', 'Employee', 'Contractor']
    }

df = pd.DataFrame(mydict)

# Create column with total number of staff Status per Department
df['total_dept'] = df.groupby(['Department'])['Name'].transform('count')
print(df)
print('\n')


# Crate column with Status ratio per department
for k, v, in df.iterrows():
    df.loc[k, 'Status_Ratio'] = (df.groupby(['Department', 'Status']).count().xs(v['Status'], level=1)['total_dept'][v['Department']]/v['total_dept']) *100
print(df)
print('\n')

# Final Groupby with Status Ratio. Size NOT needed
print(df.groupby(['Department', 'Status', 'Status_Ratio']).size())
Department  Status      Status_Ratio
Finance     Contractor  100.00
Marketing   Contractor  50.00
            Employee    50.00
Sales       Contractor  33.33 
            Employee    66.67
所需输出:

  Department    Name      Status
0      Sales    John    Employee
1      Sales   Steve    Employee
2      Sales    Sara  Contractor
3    Finance   Allen  Contractor
4  Marketing  Robert    Employee
5  Marketing    Lacy  Contractor
mydict ={
        'Name': ['John', 'Steve', 'Sara', 'Allen', 'Robert', 'Lacy'],
        'Department': ['Sales', 'Sales', 'Sales', 'Finance', 'Marketing', 'Marketing'],
        'Status': ['Employee', 'Employee', 'Contractor', 'Contractor', 'Employee', 'Contractor']
    }

df = pd.DataFrame(mydict)

# Create column with total number of staff Status per Department
df['total_dept'] = df.groupby(['Department'])['Name'].transform('count')
print(df)
print('\n')


# Crate column with Status ratio per department
for k, v, in df.iterrows():
    df.loc[k, 'Status_Ratio'] = (df.groupby(['Department', 'Status']).count().xs(v['Status'], level=1)['total_dept'][v['Department']]/v['total_dept']) *100
print(df)
print('\n')

# Final Groupby with Status Ratio. Size NOT needed
print(df.groupby(['Department', 'Status', 'Status_Ratio']).size())
Department  Status      Status_Ratio
Finance     Contractor  100.00
Marketing   Contractor  50.00
            Employee    50.00
Sales       Contractor  33.33 
            Employee    66.67
尝试(使用原始的
df
):

df.groupby(“部门”)[“状态”].value\u计数(normalize=True).mul(100)
产出:

部门状态
财务承包商100.000000
营销承包商50.000000
雇员50.000000
销售员工66.666667
承包商33.333
名称:Status,数据类型:float64
尝试(使用原始的
df
):

df.groupby(“部门”)[“状态”].value\u计数(normalize=True).mul(100)
产出:

部门状态
财务承包商100.000000
营销承包商50.000000
雇员50.000000
销售员工66.666667
承包商33.333
名称:Status,数据类型:float64

这太不可思议了!工作得很好。谢谢你的帮助。在得到这个高效之前,我还有很多路要走:)。这太不可思议了!工作得很好。谢谢你的帮助。在实现这一高效之前,我还有很多路要走:)。