Python 最有效的代码，用于剪切向量的元素，直到达到一个和_Python_Numpy_Vector

Python 最有效的代码，用于剪切向量的元素，直到达到一个和

python numpy vector

Python 最有效的代码，用于剪切向量的元素，直到达到一个和,python,numpy,vector,Python,Numpy,Vector,假设我们有一个和S1求和的整数向量。我想取这个向量，然后生成另一个向量，求和为S2，不幸的是，你对什么类型的结果感兴趣并不明显。但是假设你有一个特定长度的数组，你想去掉第一个元素a[0:ix]，这样的和就接近S1，你可以： S1 = 5 A = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,]) B = np.cumsum(A) ix = np.argmax(B>=S1)+1 C = A[0:ix] print("C = ", C); print("

假设我们有一个和S1求和的整数向量。我想取这个向量，然后生成另一个向量，求和为S2，不幸的是，你对什么类型的结果感兴趣并不明显。但是假设你有一个特定长度的数组，你想去掉第一个元素a[0:ix]，这样的和就接近S1，你可以：

S1 = 5
A = np.array([1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,])

B = np.cumsum(A)
ix = np.argmax(B>=S1)+1
C = A[0:ix]
print("C = ", C); print("sum C = ", np.sum(C))

给

C =  [1 1 1 1 1]
sum C =  5

你可以在一行中写出同样的内容

C = A[0:np.argmax(np.cumsum(A)>=S1)+1]

你基本上是去罗宾汉那里，删掉高于全球平均值的值，直到全球总和达到一个阈值。使用这个理论，我们将从一个基线数字开始，然后循环，就像这样-

def clip_until_sum(vec, total):
    # Get array version
    a = np.asarray(vec)  

    if a.sum() <= total: 
        return a

    # Baseline number
    b = int(total/float(len(a)))

    # Setup output
    out = np.where(a > b, b, a)
    s = out.sum()

    # Loop to shift up values starting from baseline
    while s<total:
        idx = np.flatnonzero(a > out)
        dss = total - s
        out[idx[max(0,len(idx)-dss):]] += 1
        s = out.sum()
    return out

第2组：

In [875]: clip_until_sum([1,4,8,3,5,6], 12)
Out[875]: array([1, 2, 2, 2, 2, 3])

运行时测试和验证-

In [164]: np.random.seed(0)

# Assuming 10000 elems with max of 1000 and total as half of sum
In [165]: vec = np.random.randint(0, 1000, size=10000)

In [167]: total = vec.sum()//2

In [168]: np.allclose(clip_to_sum(vec, total), clip_until_sum(vec, total))
Out[168]: True

In [169]: %timeit clip_to_sum(vec, total)
1 loop, best of 3: 19.1 s per loop

In [170]: %timeit clip_until_sum(vec, total)
100 loops, best of 3: 2.8 ms per loop

# @Warren Weckesser's soln
In [171]: %timeit limit_sum1(vec, total)
1000 loops, best of 3: 733 µs per loop

您可以修改函数以包含max和second max元素之间的差异。这将在每个循环中使用额外的计算资源，但会显著减少循环的总数

我已经测试了这个函数和你原来的函数，它给出了相同的结果。不过，无可否认，我很难看到两者之间有任何真正的加速

def clip_to_sum(vec, total):
    current_total = np.sum(vec)
    new_vec = np.array(vec)
    while current_total > total:
        i = np.argmax(new_vec)
        d = np.partition(new_vec.flatten(), -2)[-2]
        diff = new_vec[i] - d
        if not (new_vec[i] == diff) and diff > 0:
          new_vec[i] -= diff
          current_total -= diff
        else:
          new_vec[i] -= 1
          current_total -= 1
    return new_vec

下面是两个计算剪裁数组的函数。第一个，limit_sum1，不会给出与函数完全相同的结果，因为实际上，当最大值在输入向量中多次出现时，它会对减少哪个最大值做出不同的选择。也就是说，如果vec=[4,4,4]，total=11，则有三种可能的结果：[3,4,4]，[4,3,4]和[4,4,3]。你的函数给出[3,4,4]，而limit_sum1给出[4,4,3]

对于非常小的输入向量，如问题中的示例，limit_sum2通常比limit_sum1快，但都不比clip_to_之和快。对于输入范围变化较大的较长输入向量，两者都比clip_to_sum快，对于非常长的输入向量，limit_sum1快得多。下面是有关计时的示例

def limit_sum1(vec, total):
    x = np.asarray(vec)
    delta = x.sum() - total
    if delta <= 0:
        return x

    i = np.argsort(x)

    # j is the inverse of the sorting permutation i.
    j = np.empty_like(i)
    j[i] = np.arange(len(x))[::-1]

    y = np.zeros(len(x)+1, dtype=int)
    y[1:] = x[i]

    d = np.diff(y)[::-1]
    y = y[::-1]

    wd = d * np.arange(1, len(d)+1)
    cs = wd.cumsum()

    k = np.searchsorted(cs, delta, side='right')
    if k > 0:
        y[:k] -= d[:k][::-1].cumsum()[::-1]
        delta = delta - cs[k-1]

    q, r = divmod(delta, k+1)

    y[:k+1] -= q
    y[:r] -= 1

    x2 = y[j]
    return x2


def limit_sum2(vec, total):
    a = np.array(vec)
    while a.sum() > total:
        amax = a.max()
        i = np.where(a == amax)[0]
        if len(i) < len(a):
            nextmax = a[a < amax].max()
        else:
            nextmax = 0
        clip_to_nextmax_delta = len(i)*(amax - nextmax)
        diff = a.sum() - total
        if clip_to_nextmax_delta > diff:
            q, r = divmod(diff, len(i))
            a[i] -= q
            a[i[:r]] -= 1
            break
        else:
            # Clip all the current max values to nextmax.
            a[i] = nextmax
    return a

limit_sum1、limit_sum2和clip_to_sum都给出相同的结果：

In [1389]: limit_sum1(vec, total=10)
Out[1389]: array([1, 3, 3, 3])

In [1390]: limit_sum2(vec, total=10)
Out[1390]: array([1, 3, 3, 3])

In [1391]: clip_to_sum(vec, total=10)
Out[1391]: array([1, 3, 3, 3])

使用这个小向量，clip_to_sum更快

In [1392]: %timeit limit_sum1(vec, total=10)
33.1 µs ± 272 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [1393]: %timeit limit_sum2(vec, total=10)
24.6 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [1394]: %timeit clip_to_sum(vec, total=10)
15.6 µs ± 44.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

让我们尝试使用包含较大值的较长向量

In [1405]: np.random.seed(1729)

In [1406]: vec = np.random.randint(0, 100, size=50)

In [1407]: vec
Out[1407]: 
array([13, 37, 21, 67, 13, 89, 59, 35, 65, 91, 36, 73, 93, 83, 43, 86, 44,
       19, 51, 76, 12, 26, 43,  0, 42, 53, 30, 65,  3, 65, 37, 68, 64, 87,
       91,  4, 70, 10, 50, 40, 34, 32, 13,  7, 93, 79, 16, 98,  1, 35])

In [1408]: vec.sum()
Out[1408]: 2362

使用每个函数查找结果：

In [1409]: limit_sum1(vec, total=1500)
Out[1409]: 
array([13, 37, 21, 38, 13, 38, 38, 35, 38, 38, 36, 38, 38, 38, 38, 38, 38,
       19, 38, 38, 12, 26, 38,  0, 39, 38, 30, 38,  3, 38, 37, 38, 38, 38,
       38,  4, 38, 10, 38, 39, 34, 32, 13,  7, 38, 38, 16, 38,  1, 35])

In [1410]: limit_sum2(vec, total=1500)
Out[1410]: 
array([13, 37, 21, 38, 13, 38, 38, 35, 38, 38, 36, 38, 38, 38, 38, 38, 38,
       19, 38, 38, 12, 26, 38,  0, 38, 38, 30, 38,  3, 38, 37, 38, 38, 38,
       38,  4, 38, 10, 38, 38, 34, 32, 13,  7, 38, 39, 16, 39,  1, 35])

In [1411]: clip_to_sum(vec, total=1500)
Out[1411]: 
array([13, 37, 21, 38, 13, 38, 38, 35, 38, 38, 36, 38, 38, 38, 38, 38, 38,
       19, 38, 38, 12, 26, 38,  0, 38, 38, 30, 38,  3, 38, 37, 38, 38, 38,
       38,  4, 38, 10, 38, 38, 34, 32, 13,  7, 38, 39, 16, 39,  1, 35])

这一次，limit_sum1是最快的，差距很大：

In [1413]: %timeit limit_sum1(vec, total=1500)
34.9 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [1414]: %timeit limit_sum2(vec, total=1500)
272 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [1415]: %timeit clip_to_sum(vec, total=1500)
1.74 ms ± 7.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

什么是clip_by_sum？使用clip_to_sum[1,4,8,3]，total=10，我得到10。您典型的vec和total的特征是什么？也就是说，通常情况下，什么是lenvec，vec中的值的范围是什么，以及total的典型值是什么？@Divakar-这是一个错误，我现在更正了函数@Warren没有特别的特征，在某些情况下，它的运行时间不应该非常慢。例如，对于clip_to_sum[2，int1e10]，total=1e5，这个问题的目的不是剪裁向量的长度，而是剪裁最大元素的大小。参见原始问题中的示例。这很有效-对于具有大值的向量更好。但是当你有一个长的向量有很多相似的值时，它看起来仍然有点低效，因为它每次都做一个新的argmax和部分排序。。。一个问题是，如果总和超过向量的和，while循环就永远不会终止，即clip_to_sum[1,4,8,3]，20永远不会停止。它还看起来该方法的运行时间会随着向量的大小而缩放，因为我们只是在每个向量上加上1loop@Peterclip_to_sum[1,4,8,3]，20的预期输出是多少？如果它是[1,4,8,3]，我们可以简单地得到总和，并在开始时检查总数和短路情况。@Peter是的，如果有大量的大数字和小数字变化，那么运行时间会增加。clip_to_sum[1,4,8,3]，20应该是[1,4,8,3]，因此这是一个简单的修复方法，只需要在测试之前进行检查loop@Peter对我想这个沃伦的家伙明白了。

In [1405]: np.random.seed(1729)

In [1406]: vec = np.random.randint(0, 100, size=50)

In [1407]: vec
Out[1407]: 
array([13, 37, 21, 67, 13, 89, 59, 35, 65, 91, 36, 73, 93, 83, 43, 86, 44,
       19, 51, 76, 12, 26, 43,  0, 42, 53, 30, 65,  3, 65, 37, 68, 64, 87,
       91,  4, 70, 10, 50, 40, 34, 32, 13,  7, 93, 79, 16, 98,  1, 35])

In [1408]: vec.sum()
Out[1408]: 2362

In [1409]: limit_sum1(vec, total=1500)
Out[1409]: 
array([13, 37, 21, 38, 13, 38, 38, 35, 38, 38, 36, 38, 38, 38, 38, 38, 38,
       19, 38, 38, 12, 26, 38,  0, 39, 38, 30, 38,  3, 38, 37, 38, 38, 38,
       38,  4, 38, 10, 38, 39, 34, 32, 13,  7, 38, 38, 16, 38,  1, 35])

In [1410]: limit_sum2(vec, total=1500)
Out[1410]: 
array([13, 37, 21, 38, 13, 38, 38, 35, 38, 38, 36, 38, 38, 38, 38, 38, 38,
       19, 38, 38, 12, 26, 38,  0, 38, 38, 30, 38,  3, 38, 37, 38, 38, 38,
       38,  4, 38, 10, 38, 38, 34, 32, 13,  7, 38, 39, 16, 39,  1, 35])

In [1411]: clip_to_sum(vec, total=1500)
Out[1411]: 
array([13, 37, 21, 38, 13, 38, 38, 35, 38, 38, 36, 38, 38, 38, 38, 38, 38,
       19, 38, 38, 12, 26, 38,  0, 38, 38, 30, 38,  3, 38, 37, 38, 38, 38,
       38,  4, 38, 10, 38, 38, 34, 32, 13,  7, 38, 39, 16, 39,  1, 35])

In [1413]: %timeit limit_sum1(vec, total=1500)
34.9 µs ± 257 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [1414]: %timeit limit_sum2(vec, total=1500)
272 µs ± 2.12 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [1415]: %timeit clip_to_sum(vec, total=1500)
1.74 ms ± 7.08 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)