Python 用于对数组进行装箱的numpy函数_Python_Numpy

Python 用于对数组进行装箱的numpy函数

python numpy

Python 用于对数组进行装箱的numpy函数,python,numpy,Python,Numpy,假设我有一个数组L=[1,0,5,1]，我想把它放在两个箱子里，我想把它取出Lbin=[1,6]。同样地，让我们假设L=[1,3,5,2,6,7]我想把它放在三个箱子里，我想出去Lbin=[4,7,13] 如果b是箱子的数量，我们假设b除以len（L），则为有一个numpy函数来做这个吗我的数组L会很大，而且我有很多，所以我需要一个线性时间解决方案来解决这个问题 Divakar的回答很好。此外：有没有一种简单的方法来处理b没有的情况将len（L）分开，这样最后一个箱子中的元素就更少了？

假设我有一个数组

L=[1,0,5,1]

，我想把它放在两个箱子里，我想把它取出

Lbin=[1,6]

。同样地，让我们假设

L=[1,3,5,2,6,7]

我想把它放在三个箱子里，我想出去

Lbin=[4,7,13]

如果

是箱子的数量，我们假设

除以

len（L）

，则为有一个numpy函数来做这个吗

我的数组

会很大，而且我有很多，所以我需要一个线性时间解决方案来解决这个问题

Divakar的回答很好。此外：

有没有一种简单的方法来处理

没有的情况将

len（L）

分开，这样最后一个箱子中的元素就更少了？因此

L=[1,0,5,1,4]

和

b=2

将为您提供

[6,5]

我们可以简单地重新设计，将这些组分成几行，然后对每行求和以获得所需的输出，如下所示-

np.reshape(L,(num_bins,-1)).sum(1)

对于长度不一定可被存储单元数整除的数组-

def sum_groups(L, num_bins):
    n  = len(L)
    grp_len = int(np.ceil(n/float(num_bins)))
    b = int(n%num_bins!=0)
    lim = grp_len*(num_bins-b)
    p0 = np.reshape(L[:lim],(-1,grp_len)).sum(1)

    if b!=0:
        p1 = np.sum(L[lim:])
        return np.r_[p0,p1]
    else:
        return p0

对于装箱总和在输入数组数据类型精度范围内的情况，引入

np.einsum

def sum_groups_einsum(L, num_bins):
    n  = len(L)
    grp_len = int(np.ceil(n/float(num_bins)))
    b = int(n%num_bins!=0)
    lim = grp_len*(num_bins-b)
    p0 = np.einsum('ij->i',np.reshape(L[:lim],(-1,grp_len)))

    if b!=0:
        p1 = np.einsum('i->',L[lim:])
        return np.r_[p0,p1]
    else:
        return p0

标杆管理紧跟-

对于数组长度不能被容器数整除的情况，让我们在输入数组中增加几个元素来实现相同的结果-

In [406]: # Setup
     ...: np.random.seed(0)
     ...: L = np.random.randint(0,high = 6, size = 10000012)
     ...: b = 20

In [407]: %timeit sum_groups(L, num_bins=b)
     ...: %timeit sum_groups_einsum(L, num_bins=b)
     ...: %timeit np.array([t.sum() for t in np.array_split(L, b)])
     ...: %timeit np.add.reduceat(L, np.linspace(0.5, L.size+0.5, b, False, dtype=int))
100 loops, best of 3: 6.45 ms per loop
100 loops, best of 3: 6.05 ms per loop
100 loops, best of 3: 6.45 ms per loop
100 loops, best of 3: 6.51 ms per loop

再运行几次，第一个和最后两个运行时非常相似，而第二个使用

einsum

的运行时比其他的运行时快一点点。

我们可以简单地改变形状，基本上将这些组分成几行，然后对每行求和以获得所需的输出，如下所示-

np.reshape(L,(num_bins,-1)).sum(1)

对于长度不一定可被存储单元数整除的数组-

def sum_groups(L, num_bins):
    n  = len(L)
    grp_len = int(np.ceil(n/float(num_bins)))
    b = int(n%num_bins!=0)
    lim = grp_len*(num_bins-b)
    p0 = np.reshape(L[:lim],(-1,grp_len)).sum(1)

    if b!=0:
        p1 = np.sum(L[lim:])
        return np.r_[p0,p1]
    else:
        return p0

对于装箱总和在输入数组数据类型精度范围内的情况，引入

np.einsum

def sum_groups_einsum(L, num_bins):
    n  = len(L)
    grp_len = int(np.ceil(n/float(num_bins)))
    b = int(n%num_bins!=0)
    lim = grp_len*(num_bins-b)
    p0 = np.einsum('ij->i',np.reshape(L[:lim],(-1,grp_len)))

    if b!=0:
        p1 = np.einsum('i->',L[lim:])
        return np.r_[p0,p1]
    else:
        return p0

标杆管理紧跟-

对于数组长度不能被容器数整除的情况，让我们在输入数组中增加几个元素来实现相同的结果-

In [406]: # Setup
     ...: np.random.seed(0)
     ...: L = np.random.randint(0,high = 6, size = 10000012)
     ...: b = 20

In [407]: %timeit sum_groups(L, num_bins=b)
     ...: %timeit sum_groups_einsum(L, num_bins=b)
     ...: %timeit np.array([t.sum() for t in np.array_split(L, b)])
     ...: %timeit np.add.reduceat(L, np.linspace(0.5, L.size+0.5, b, False, dtype=int))
100 loops, best of 3: 6.45 ms per loop
100 loops, best of 3: 6.05 ms per loop
100 loops, best of 3: 6.45 ms per loop
100 loops, best of 3: 6.51 ms per loop

再运行几次，第一个和最后两个运行时非常相似，第二个使用

einsum

的运行时比其他运行时稍微快一点。

您可以使用

np.add.reduceat

：

>>> np.add.reduceat(L, np.linspace(0, L.size, nbin, False, dtype=int))

它对垃圾箱边缘进行的圆角处理与您的示例不同，不过：

>>> L = np.array([1,0,5,1,4])
>>> np.add.reduceat(L, np.linspace(0, L.size, nbin, False, dtype=int))
array([ 1, 10])

要获得您的四舍五入：

>>> np.add.reduceat(L, np.linspace(0.5, L.size+0.5, nbin, False, dtype=int))
array([6, 5])

为了提高性能，我们可以避免使用

linspace

并使用整数算法：

>>> np.add.reduceat(L, np.arange(nbin//2, L.size * nbin, L.size) // nbin)

值得一提的是，基于重塑的解决方案并不总是给出与其他解决方案相同的结果，事实上，在相当多的情况下重塑根本不起作用。示例：

元素，

组。这需要两组

和

元素，每组

组。显然，这不能通过重塑来实现

性能比较（10个存储箱，元素计数不是倍数）：

基准代码：

import perfplot
import numpy as np

def sg_reshape(args):
    L, num_bins = args
    n  = len(L)
    grp_len = int(np.ceil(n/float(num_bins)))
    b = int(n%num_bins!=0)
    lim = grp_len*(num_bins-b)
    p0 = np.reshape(L[:lim],(-1,grp_len)).sum(1)

    if b!=0:
        p1 = np.sum(L[lim:])
        return np.r_[p0,p1]
    else:
        return p0

def sg_einsum(args):
    L, num_bins = args
    n  = len(L)
    grp_len = int(np.ceil(n/float(num_bins)))
    b = int(n%num_bins!=0)
    lim = grp_len*(num_bins-b)
    p0 = np.einsum('ij->i',np.reshape(L[:lim],(-1,grp_len)))

    if b!=0:
        p1 = np.sum(L[lim:])
        return np.r_[p0,p1]
    else:
        return p0

def sg_addred(args):
    L, nbin = args
    return np.add.reduceat(L, np.linspace(0.5, L.size+0.5, nbin, False, dtype=int))

def sg_intarith(args):
    L, nbin = args
    return np.add.reduceat(L, np.arange(nbin//2, L.size * nbin, L.size) // nbin)

def sg_arrsplit(args):
    L, b = args
    return np.array([t.sum() for t in np.array_split(L, b)])

perfplot.save('cho10.png',
              setup=lambda n: (np.random.randint(0, 9, (n,)), 10),
              n_range=[2**k for k in range(8, 23)],
    kernels=[
    sg_reshape,
    sg_einsum,
    sg_addred,
    sg_intarith,
    sg_arrsplit
        ],
    logx=True,
    logy=True,
    xlabel='#elements',
    equality_check=None
    )

您可以使用

np.add.reduceat

：

>>> np.add.reduceat(L, np.linspace(0, L.size, nbin, False, dtype=int))

它对垃圾箱边缘进行的圆角处理与您的示例不同，不过：

>>> L = np.array([1,0,5,1,4])
>>> np.add.reduceat(L, np.linspace(0, L.size, nbin, False, dtype=int))
array([ 1, 10])

要获得您的四舍五入：

>>> np.add.reduceat(L, np.linspace(0.5, L.size+0.5, nbin, False, dtype=int))
array([6, 5])

为了提高性能，我们可以避免使用

linspace

并使用整数算法：

>>> np.add.reduceat(L, np.arange(nbin//2, L.size * nbin, L.size) // nbin)

值得一提的是，基于重塑的解决方案并不总是给出与其他解决方案相同的结果，事实上，在相当多的情况下重塑根本不起作用。示例：

元素，

组。这需要两组

和

元素，每组

组。显然，这不能通过重塑来实现

性能比较（10个存储箱，元素计数不是倍数）：

基准代码：

import perfplot
import numpy as np

def sg_reshape(args):
    L, num_bins = args
    n  = len(L)
    grp_len = int(np.ceil(n/float(num_bins)))
    b = int(n%num_bins!=0)
    lim = grp_len*(num_bins-b)
    p0 = np.reshape(L[:lim],(-1,grp_len)).sum(1)

    if b!=0:
        p1 = np.sum(L[lim:])
        return np.r_[p0,p1]
    else:
        return p0

def sg_einsum(args):
    L, num_bins = args
    n  = len(L)
    grp_len = int(np.ceil(n/float(num_bins)))
    b = int(n%num_bins!=0)
    lim = grp_len*(num_bins-b)
    p0 = np.einsum('ij->i',np.reshape(L[:lim],(-1,grp_len)))

    if b!=0:
        p1 = np.sum(L[lim:])
        return np.r_[p0,p1]
    else:
        return p0

def sg_addred(args):
    L, nbin = args
    return np.add.reduceat(L, np.linspace(0.5, L.size+0.5, nbin, False, dtype=int))

def sg_intarith(args):
    L, nbin = args
    return np.add.reduceat(L, np.arange(nbin//2, L.size * nbin, L.size) // nbin)

def sg_arrsplit(args):
    L, b = args
    return np.array([t.sum() for t in np.array_split(L, b)])

perfplot.save('cho10.png',
              setup=lambda n: (np.random.randint(0, 9, (n,)), 10),
              n_range=[2**k for k in range(8, 23)],
    kernels=[
    sg_reshape,
    sg_einsum,
    sg_addred,
    sg_intarith,
    sg_arrsplit
        ],
    logx=True,
    logy=True,
    xlabel='#elements',
    equality_check=None
    )

以下工作：

array([t.sum() for t in array_split(L, b)])

如您所述，如果您知道

平均分割

，则可以使用

split

函数替换

array\u split

下面是一些基准测试，其中

b=100

和

L=randint（01001000）

并且带有

b=3

和

L=randint（010000）

根据您的数据，Divakar使用重塑的答案似乎是最好的方法。

以下方法有效

array([t.sum() for t in array_split(L, b)])