Python 将列表的每个值映射到其加权百分位

Python 将列表的每个值映射到其加权百分位,python,numpy,scipy,Python,Numpy,Scipy,我想计算一个列表(或numpy数组)中每个值的百分位数,通过另一个列表中的权重进行加权。例如,给定一些f,我想: x = [1, 2, 3, 4] weights = [2, 2, 3, 3] f(x, weights) 产生[20,40,70,100] 我可以使用 from scipy import stats stats.percentileofscore(x, 3) # 75.0 根据我也可以计算每个使用 [stats.percentileofscore(x, a, 'rank') f

我想计算一个列表(或numpy数组)中每个值的百分位数,通过另一个列表中的权重进行加权。例如,给定一些
f
,我想:

x = [1, 2, 3, 4]
weights = [2, 2, 3, 3]
f(x, weights)
产生
[20,40,70,100]

我可以使用

from scipy import stats
stats.percentileofscore(x, 3)
# 75.0
根据我也可以计算每个使用

[stats.percentileofscore(x, a, 'rank') for a in x]
# [25.0, 50.0, 75.0, 100.0]
根据,我可以使用以下公式计算单个项目的加权百分比:

def weighted_percentile_of_score(x, weights, score, kind='weak'):
    npx = np.array(x)
    npw = np.array(weights)

    if kind == 'rank':  # Equivalent to 'weak' since we have weights.
        kind = 'weak'

    if kind in ['strict', 'mean']:
        indx = npx < score
        strict = 100 * sum(npw[indx]) / sum(weights)
    if kind == 'strict':
        return strict

    if kind in ['weak', 'mean']:    
        indx = npx <= score
        weak = 100 * sum(npw[indx]) / sum(weights)
    if kind == 'weak':
        return weak

    if kind == 'mean':
        return (strict + weak) / 2

如何(有效地)为列表中的每个项目执行此操作?

这不是很有效,但您可以结合问题中列出的方法:

[weighted_percentile_of_score(x, weights, val) for val in x]
# [20.0, 40.0, 70.0, 100.0]
适应您可以对数组进行排序,然后将权重的
cumsum
除以总权重:

def weighted_percentileofscore(values, weights=None, values_sorted=False):
    """ Similar to scipy.percentileofscore, but supports weights.
    :param values: array-like with data.
    :param weights: array-like of the same length as `values`.
    :param values_sorted: bool, if True, then will avoid sorting of initial array.
    :return: numpy.array with percentiles of sorted array.
    """
    values = np.array(values)
    if weights is None:
        weights = np.ones(len(values))
    weights = np.array(weights)

    if not values_sorted:
        sorter = np.argsort(values)
        values = values[sorter]
        weights = weights[sorter]

    total_weight = weights.sum()
    return 100 * np.cumsum(weights) / total_weight
验证:

weighted_percentileofscore(x, weights)
# array([20., 40., 70., 100. ])
如果传递了未排序的数组,则必须将其映射回原始排序,因此最好先排序

这应该比单独计算每个值要快得多

weighted_percentileofscore(x, weights)
# array([20., 40., 70., 100. ])