Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/297.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 带熊猫的分组数据帧的统计信息_Python_Pandas_Group By - Fatal编程技术网

Python 带熊猫的分组数据帧的统计信息

Python 带熊猫的分组数据帧的统计信息,python,pandas,group-by,Python,Pandas,Group By,我有一个数据帧,它基本上可以分为两列:Level和Sub_Level 数据如下所示: Level_1 Sub_level Value 0 Group A A1 100 1 Group A A2 200 2 Group A A1 150 3 Group B B1 100 4 Group B B2 200 5 Group A A1

我有一个数据帧,它基本上可以分为两列:
Level
Sub_Level

数据如下所示:

    Level_1    Sub_level   Value

0    Group A   A1          100
1    Group A   A2          200
2    Group A   A1          150
3    Group B   B1          100
4    Group B   B2          200
5    Group A   A1          200
6    Group A   A1          300
7    Group A   A1          400
8    Group B   B2          450
...
我想获得每个
子级别
与每个可比较
级别
的频率/计数,即

Level_1   Sub_level   Pct_of_total

Group A   A1          5 / 6  (as there are 6 Group A instances in 'Level_1', and 5 A1:s in 'Sub_level')
          A2          1 / 6 
Group B   B1          1 / 3  (as there are 3 Group B instances in 'Level_1', and 1 B1:s in 'Sub_level')
          B2          2 / 3
当然,新列
Pct\u/u total
中的分数应以 百分比

有什么线索吗

谢谢

/N

我想你首先需要+for
df
,然后是
groupby
第一级(
level_1
)和
sum
。最后除以:

可能重复的
df1 = df.groupby(['Level_1','Sub_level'])['Value'].size()
print (df1)
Level_1  Sub_level
Group A  A1           5
         A2           1
Group B  B1           1
         B2           2
Name: Value, dtype: int64

df2 = df1.groupby(level=0).transform('sum')
print (df2)
Level_1  Sub_level
Group A  A1           6
         A2           6
Group B  B1           3
         B2           3
Name: Value, dtype: int64

df3 = df1.div(df2).reset_index(name='Pct_of_total')
print (df3)
   Level_1 Sub_level  Pct_of_total
0  Group A        A1      0.833333
1  Group A        A2      0.166667
2  Group B        B1      0.333333
3  Group B        B2      0.666667