Python 将列表的每个值映射到其加权百分位
我想计算一个列表(或numpy数组)中每个值的百分位数,通过另一个列表中的权重进行加权。例如,给定一些Python 将列表的每个值映射到其加权百分位,python,numpy,scipy,Python,Numpy,Scipy,我想计算一个列表(或numpy数组)中每个值的百分位数,通过另一个列表中的权重进行加权。例如,给定一些f,我想: x = [1, 2, 3, 4] weights = [2, 2, 3, 3] f(x, weights) 产生[20,40,70,100] 我可以使用 from scipy import stats stats.percentileofscore(x, 3) # 75.0 根据我也可以计算每个使用 [stats.percentileofscore(x, a, 'rank') f
f
,我想:
x = [1, 2, 3, 4]
weights = [2, 2, 3, 3]
f(x, weights)
产生[20,40,70,100]
我可以使用
from scipy import stats
stats.percentileofscore(x, 3)
# 75.0
根据我也可以计算每个使用
[stats.percentileofscore(x, a, 'rank') for a in x]
# [25.0, 50.0, 75.0, 100.0]
根据,我可以使用以下公式计算单个项目的加权百分比:
def weighted_percentile_of_score(x, weights, score, kind='weak'):
npx = np.array(x)
npw = np.array(weights)
if kind == 'rank': # Equivalent to 'weak' since we have weights.
kind = 'weak'
if kind in ['strict', 'mean']:
indx = npx < score
strict = 100 * sum(npw[indx]) / sum(weights)
if kind == 'strict':
return strict
if kind in ['weak', 'mean']:
indx = npx <= score
weak = 100 * sum(npw[indx]) / sum(weights)
if kind == 'weak':
return weak
if kind == 'mean':
return (strict + weak) / 2
如何(有效地)为列表中的每个项目执行此操作?这不是很有效,但您可以结合问题中列出的方法:
[weighted_percentile_of_score(x, weights, val) for val in x]
# [20.0, 40.0, 70.0, 100.0]
适应您可以对数组进行排序,然后将权重的cumsum
除以总权重:
def weighted_percentileofscore(values, weights=None, values_sorted=False):
""" Similar to scipy.percentileofscore, but supports weights.
:param values: array-like with data.
:param weights: array-like of the same length as `values`.
:param values_sorted: bool, if True, then will avoid sorting of initial array.
:return: numpy.array with percentiles of sorted array.
"""
values = np.array(values)
if weights is None:
weights = np.ones(len(values))
weights = np.array(weights)
if not values_sorted:
sorter = np.argsort(values)
values = values[sorter]
weights = weights[sorter]
total_weight = weights.sum()
return 100 * np.cumsum(weights) / total_weight
验证:
weighted_percentileofscore(x, weights)
# array([20., 40., 70., 100. ])
如果传递了未排序的数组,则必须将其映射回原始排序,因此最好先排序
这应该比单独计算每个值要快得多
weighted_percentileofscore(x, weights)
# array([20., 40., 70., 100. ])