Python：获取四分位数的数组索引_Python_Numpy_Percentile_Quartile

Python：获取四分位数的数组索引

python numpy

Python：获取四分位数的数组索引,python,numpy,percentile,quartile,Python,Numpy,Percentile,Quartile,我使用以下代码计算给定数据集的四分位数： #!/usr/bin/python import numpy as np series = [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8] p1 = 25 p2 = 50 p3 = 75 q1 = np.percentile(series, p1) q2 = np.percentile(series, p2) q3 = np.percentile(series, p3) print('percentile(' + st

我使用以下代码计算给定数据集的四分位数：

#!/usr/bin/python

import numpy as np

series = [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8]

p1 = 25
p2 = 50
p3 = 75

q1 = np.percentile(series,  p1)
q2 = np.percentile(series,  p2)
q3 = np.percentile(series,  p3)

print('percentile(' + str(p1) + '): ' + str(q1))
print('percentile(' + str(p2) + '): ' + str(q2))
print('percentile(' + str(p3) + '): ' + str(q3))

百分位函数返回四分位，但是，我也希望获取用于标记四分位数边界的索引。有什么方法可以做到这一点吗？

试试这个：

import numpy as np
import pandas as pd
series = [1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8]
thresholds = [25,50,75]
output = pd.DataFrame([np.percentile(series,x) for x in thresholds], index = thresholds, columns = ['quartiles'])
output

通过将其设置为数据帧，您可以非常轻松地分配索引。

假设数据总是经过排序（感谢@juanpa.arrivillaga），您可以使用Pandas类中的

rank

方法

rank（）

接受多个参数。其中一个是

pct

：

pct:boolean，默认为False

计算数据的百分比排名

有不同的方法计算百分比排名。这些方法由参数

方法控制：
方法：{'average'，'min'，'max'，'first'，'dense'}
您需要方法“max”
：
马克斯：组中最高级别
让我们看看带有以下参数的rank（）
方法的输出：
将numpy导入为np
作为pd进口熊猫
系列=[1,2,2,2,2,2,2,2,2,2,2,5,5,6,7,8]
S=pd.系列（系列）
百分比排名=S.rank（method=“max”，pct=True）
打印（百分比排名）

这基本上为您提供了系列中每个条目的百分比：
0     0.0625
1     0.6875
2     0.6875
3     0.6875
4     0.6875
5     0.6875
6     0.6875
7     0.6875
8     0.6875
9     0.6875
10    0.6875
11    0.8125
12    0.8125
13    0.8750
14    0.9375
15    1.0000
dtype: float64

为了检索三个百分位的索引，您可以在系列中查找第一个元素，该元素的百分比排名等于或高于您感兴趣的百分位。该元素的索引就是您需要的索引
index25=S.index[百分比等级>=0.25][0]
index50=S.index[百分比排名>=0.50][0]
index75=S.index[百分比排名>=0.75][0]
打印（“25%位：索引{}，值{}”。格式（index25，S[index25]））
打印（“50%位：索引{}，值{}”。格式（index50，S[index50]））
打印（“75%位：索引{}，值{}”。格式（index75，S[index75]））

这将为您提供以下输出：
25 percentile: index 1, value 2
50 percentile: index 1, value 2
75 percentile: index 11, value 5

由于数据是经过排序的，所以可以使用返回索引，在索引处插入值以保持排序顺序。您可以指定插入值的“边”
>>> np.searchsorted(series,q1)
1
>>> np.searchsorted(series,q1,side='right')
11
>>> np.searchsorted(series,q2)
1
>>> np.searchsorted(series,q3)
11
>>> np.searchsorted(series,q3,side='right')
13

数据总是经过排序吗？否则，这个问题就没有意义了，除非我遗漏了什么。但是如果它被排序，那么你可以直接计算索引。@juanpa.arrivillaga是的，数据总是被排序的。我不确定这是如何回答这个问题的。。。我不确定我是否理解这个问题，尽管…。@juanpa.arrivillaga我假设这个问题是关于构建输出的。。。