以存储在另一列中的给定值间隔获取pandas列中项目的频率

以存储在另一列中的给定值间隔获取pandas列中项目的频率,pandas,pandas-groupby,Pandas,Pandas Groupby,我的数据帧 class_lst = ["B","A","C","Z","H","K","O","W","L","R","M","Y","Q","X","X","G","G","G","G","G"] value_lst = [1,0.999986,1,0.999358,0.999906,0.995292,0.998481,0.388307,0.99608,0.99829,1,0.087298,1,1,0.999993,1,1,1,1,1] df =pd.DataFrame( {'cl

我的数据帧

class_lst =  ["B","A","C","Z","H","K","O","W","L","R","M","Y","Q","X","X","G","G","G","G","G"]
value_lst = [1,0.999986,1,0.999358,0.999906,0.995292,0.998481,0.388307,0.99608,0.99829,1,0.087298,1,1,0.999993,1,1,1,1,1]

df =pd.DataFrame(
    {'class': class_lst,
     'val': value_lst
    })
对于范围内的任何“val”间隔

ranges = np.arange(0.0, 1.1, 0.1)
我想获得“val”项目的频率,如下所示:

class range  frequency
A (0, 0.10]    0
A (0.10, 0.20]    0
A (0.20, 0.30]   0
...
A (0.90, 100]   1 
G (0, 0.10]    0
G (0.10, 0.20]    0
G (0.20, 0.30]   0
...
G (0.80, 0.90]    0
G (0.90, 100]   5
...
我试过了

df.groupby(pd.cut(df.val, ranges)).count()
但是输出看起来像

            class  val
val                   
(0, 0.1]        1    1
(0.1, 0.2]      0    0
(0.2, 0.3]      0    0
(0.3, 0.4]      1    1
(0.4, 0.5]      0    0
(0.5, 0.6]      0    0
(0.6, 0.7]      0    0
(0.7, 0.8]      0    0
(0.8, 0.9]      0    0
(0.9, 1]       18   18

并且与预期的不匹配这可能是一个好的开始:

df["range"] = pd.cut(df['val'], ranges)

       class       val       range
0      B  1.000000  (0.9, 1.0]
1      A  0.999986  (0.9, 1.0]
2      C  1.000000  (0.9, 1.0]
3      Z  0.999358  (0.9, 1.0]
4      H  0.999906  (0.9, 1.0]
5      K  0.995292  (0.9, 1.0]
6      O  0.998481  (0.9, 1.0]
7      W  0.388307  (0.3, 0.4]
8      L  0.996080  (0.9, 1.0]
9      R  0.998290  (0.9, 1.0]
10     M  1.000000  (0.9, 1.0]
11     Y  0.087298  (0.0, 0.1]
12     Q  1.000000  (0.9, 1.0]
13     X  1.000000  (0.9, 1.0]
14     X  0.999993  (0.9, 1.0]
15     G  1.000000  (0.9, 1.0]
16     G  1.000000  (0.9, 1.0]
17     G  1.000000  (0.9, 1.0]
18     G  1.000000  (0.9, 1.0]
19     G  1.000000  (0.9, 1.0]
然后

df.groupby(["class", "range"]).size()

    class  range     
A      (0.9, 1.0]    1
B      (0.9, 1.0]    1
C      (0.9, 1.0]    1
G      (0.9, 1.0]    5
H      (0.9, 1.0]    1
K      (0.9, 1.0]    1
L      (0.9, 1.0]    1
M      (0.9, 1.0]    1
O      (0.9, 1.0]    1
Q      (0.9, 1.0]    1
R      (0.9, 1.0]    1
W      (0.3, 0.4]    1
X      (0.9, 1.0]    2
Y      (0.0, 0.1]    1
Z      (0.9, 1.0]    1

这将为每个类及其频率提供正确的bin。

为什么您希望
(0,0.10]
的频率为零?您在该范围内有一个
0.087298
(索引11)。此外,您可能希望检查
np.arange(0.0,1.0,0.1)的输出
。它不包括1.0。@ayhan是的,类“Y”在类
(0,0.10)
中的频率为1,在所有其他类中的频率为零。