Python 如何返回分位数切割范围的最大值而不是分位数标签_Python_Pandas

Python 如何返回分位数切割范围的最大值而不是分位数标签

python pandas

Python 如何返回分位数切割范围的最大值而不是分位数标签,python,pandas,Python,Pandas,我需要将连续数据存储到任意数量的分位数中。但是，我的应用程序需要返回分位数bin的最大值： import pandas as pd import numpy as np In [1]: s = pd.Series(np.random.randint(0,20,20)); s[:5] Out[1]: 0 0 1 15 2 5 3 19 4 15 假设我使用pandas.qcut创建了5个分位数： In [2]: bins = pd.qcut(s,5); bin

我需要将连续数据存储到任意数量的分位数中。但是，我的应用程序需要返回分位数bin的最大值：

import pandas as pd
import numpy as np

In [1]: s = pd.Series(np.random.randint(0,20,20)); s[:5]
Out[1]:
0     0
1    15
2     5
3    19
4    15

假设我使用pandas.qcut创建了5个分位数：

In [2]: bins = pd.qcut(s,5); bins
Out[2]:
Categorical:
array([[0, 1.8], (9.8, 15.2], (1.8, 6.2], (15.2, 19], (9.8, 15.2],
       (1.8, 6.2], (6.2, 9.8], (6.2, 9.8], (15.2, 19], (9.8, 15.2],
       [0, 1.8], (6.2, 9.8], (1.8, 6.2], [0, 1.8], (9.8, 15.2], [0, 1.8],
       (15.2, 19], (15.2, 19], (6.2, 9.8], (1.8, 6.2]], dtype=object)
Levels (5): Index([[0, 1.8], (1.8, 6.2], (6.2, 9.8], (9.8, 15.2],
                   (15.2, 19]], dtype=object)

带垃圾箱标签：

In [3]: bins.labels
Out[3]: array([0, 3, 1, 4, 3, 1, 2, 2, 4, 3, 0, 2, 1, 0, 3, 0, 4, 4, 2, 1])

有没有一种方法可以返回每个值所属的上二进制边，而不是返回分位数的数目？以下是我所需输出的示例：

    original  bin_max
0          0        1
1         15       15
2          5        5
3         19       19
4         15       15
5          2        5
6          7        9
7          7        9
8         16       19
9         12       15
10         0        1
11         8        9
12         5        5
13         1        1
14        11       15
15         1        1
16        18       19
17        16       19
18         9        9
19         3        5

这是我当前使用的解决方案，但当我需要的值已在qcut标签中找到时，按qcut分组似乎效率低下：

In [4]: s.groupby(pd.qcut(s,5)).transform(max)
Out[4]:
0      1
1     15
2      5
3     19
4     15
5      5

您可以使用

retbins=True

以numpy数组的形式获取存储箱的边缘：

import pandas as pd
import numpy as np

np.random.seed(1)
s = pd.Series(np.random.randint(0,20,20))

categories, edges = pd.qcut(s, 5, retbins=True)
df = pd.DataFrame({'original':s,
                   'bin_max': edges[1:][categories.labels]},
                  columns = ['original', 'bin_max'])
print(df)

屈服

    original  bin_max
0          5      5.0
1         11     11.0
2         12     13.4
3          8      8.6
4          9     11.0
5         11     11.0
6          5      5.0
7         15     18.0
8          0      5.0
9         16     18.0
10         1      5.0
11        12     13.4
12         7      8.6
13        13     13.4
14         6      8.6
15        18     18.0
16         5      5.0
17        18     18.0
18        11     11.0
19        10     11.0

您可以使用

retbins=True

以numpy数组的形式获取存储箱的边缘：

import pandas as pd
import numpy as np

np.random.seed(1)
s = pd.Series(np.random.randint(0,20,20))

categories, edges = pd.qcut(s, 5, retbins=True)
df = pd.DataFrame({'original':s,
                   'bin_max': edges[1:][categories.labels]},
                  columns = ['original', 'bin_max'])
print(df)

屈服

    original  bin_max
0          5      5.0
1         11     11.0
2         12     13.4
3          8      8.6
4          9     11.0
5         11     11.0
6          5      5.0
7         15     18.0
8          0      5.0
9         16     18.0
10         1      5.0
11        12     13.4
12         7      8.6
13        13     13.4
14         6      8.6
15        18     18.0
16         5      5.0
17        18     18.0
18        11     11.0
19        10     11.0

对我来说，标签=假更有效

import pandas as pd
import numpy as np

np.random.seed(1)
s = pd.Series(np.random.randint(0,20,20))

categories, edges = pd.qcut(s, 5, retbins=True, labels=False)
df = pd.DataFrame({'original':s,
                   'bin_max': edges[1:][categories]},
                  columns = ['original', 'bin_max'])
print(df)

对我来说，标签=假更有效

import pandas as pd
import numpy as np

np.random.seed(1)
s = pd.Series(np.random.randint(0,20,20))

categories, edges = pd.qcut(s, 5, retbins=True, labels=False)
df = pd.DataFrame({'original':s,
                   'bin_max': edges[1:][categories]},
                  columns = ['original', 'bin_max'])
print(df)

谢谢！我玩弄了

retbins

，但没想到要把它打开。这个解决方案将很好地工作。非常感谢！我玩弄了

retbins

，但没想到要把它打开。这个解决方案会很好地工作。