python中numpy数组的随机大小分块

python中numpy数组的随机大小分块,python,arrays,numpy,Python,Arrays,Numpy,我想将索引数组分成随机大小的块(从有限的可能大小范围中选取),这些块也在彼此之间混洗。我尝试了以下我发现的方法,但它关注的是同样大小的块 a = np.arange(1, 100) def chunk(xs, n): # to chunk the array xs in n parts ys = list(xs) random.shuffle(ys) size = len(ys) // n leftovers= ys[size*n:] for c, xt

我想将索引数组分成随机大小的块(从有限的可能大小范围中选取),这些块也在彼此之间混洗。我尝试了以下我发现的方法,但它关注的是同样大小的块

a = np.arange(1, 100)

def chunk(xs, n): # to chunk the array xs in n parts
    ys = list(xs)
    random.shuffle(ys)
    size = len(ys) // n
    leftovers= ys[size*n:]
    for c, xtra in enumerate(leftovers):
        yield ys[c*size:(c+1)*size] + [ xtra ]
    for c in xrange(c+1,n):
        yield ys[c*size:(c+1)*size]
换言之,我如何更改上述函数,使其具有一定数量的块(随机数和相互之间的随机数),并从某个范围随机获取可变大小,例如
[5-10]

,这将起作用:

from itertools import chain
import numpy as np

a = np.arange(1, 100)
def chunk(xs, nlow, nhigh, shuffle=True):
    xs = np.asarray(xs)
    if shuffle:
        # shuffle, if you want
        xs = xs.copy()
        np.random.shuffle(xs)

    # get at least enough random chunk sizes in the specified range, ie nlow <= n <= nhigh
    ns = np.random.randint(nlow, nhigh+1, size=xs.size//nlow)
    # add up the chunk sizes to get the indices at which we'll slice up the input array
    ixs = np.add.accumulate(ns)
    # truncate ixs so that its contents are all valid indices with respect to xs
    ixs = ixs[:np.searchsorted(ixs, xs.size)]

    # yield slices from the input array
    for start,end in zip(chain([None], ixs), chain(ixs, [None])):
        yield xs[start:end]

list(chunk(a, 5, 10))
编辑 我最初的答案并没有给最后一个块的大小设置一个下限,所以有时候它会比指定的小(尽管永远不会大)。据我所知,没有直接的方法来处理这个问题。但是,一般来说,您可以通过拒绝来自该区域的任何样本,从随机分布中删除不需要的区域。换言之,您可以通过丢弃任何建议的块来确保最后一个块足够大:

def getIxs(xsize, nlow, nhigh):
    # get at least enough random chunk sizes in the specified range, ie nlow <= n <= nhigh
    ns = np.random.randint(nlow, nhigh+1, size=xsize//nlow)

    # add up the chunk sizes to get the indices at which we'll slice up the input array
    ixs = np.add.accumulate(ns)

    # truncate ixs so that its contents are all valid indices with respect to xs
    ixs = ixs[:np.searchsorted(ixs, xsize)]

    return ixs

def chunk(xs, nlow, nhigh):
    xs = np.asarray(xs)

    ixs = getIxs(xs.size, nlow, nhigh)

    # rerun getIxs until the size of the final chunk is large enough
    while (xs.size - ixs[-1]) < nlow:
        ixs = getIxs(xs.size, nlow, nhigh)

    # yield slices from the input array
    for start,end in zip(chain([None], ixs), chain(ixs, [None])):
        yield xs[start:end]
def getIxs(xsize、nlow、nhigh):
#在指定范围内至少获取足够的随机块大小,即nlow您可以使用
np.split(数组、索引)


你的问题还不完整。。范围是多少?它是如何计算的?假设可能的尺寸范围是5-10。这是不相关的,但我会将它添加到问题中,在这种情况下,您可能会或可能不会满足n个块数约束,对吧?是的,没错。相关变量是对块大小的限制,应在特定范围内随机选取,例如[5-10]!选项shuffle=False正是我想要的。谢谢!:)@Garini是的,我不确定你是否有意使用shuffle,或者复制的代码中是否携带了cruft。所以我选择了它,只有一个缺陷,我注意到。最后一个分区可以随机突破大小限制。有一种方法可以避免这种情况吗?@Garini通过设计(或多或少)最终的块可以小于下限。没有有效的方法来解决这个问题,但是你可以使用基于“拒绝”的方法来得到你想要的。我已经在上面的一篇编辑文章中发布了一些相关的代码。在我的例子中,我希望在不强加拆分数量的情况下,将拆分的大小保持在一个有限的范围内。另外,我需要在拆分中有有序的数字。在这种情况下,k也应该是一个随机数,但不确定,拆分中的有序数字是什么意思?即使原始数组未排序,也要对拆分进行排序吗?
def getIxs(xsize, nlow, nhigh):
    # get at least enough random chunk sizes in the specified range, ie nlow <= n <= nhigh
    ns = np.random.randint(nlow, nhigh+1, size=xsize//nlow)

    # add up the chunk sizes to get the indices at which we'll slice up the input array
    ixs = np.add.accumulate(ns)

    # truncate ixs so that its contents are all valid indices with respect to xs
    ixs = ixs[:np.searchsorted(ixs, xsize)]

    return ixs

def chunk(xs, nlow, nhigh):
    xs = np.asarray(xs)

    ixs = getIxs(xs.size, nlow, nhigh)

    # rerun getIxs until the size of the final chunk is large enough
    while (xs.size - ixs[-1]) < nlow:
        ixs = getIxs(xs.size, nlow, nhigh)

    # yield slices from the input array
    for start,end in zip(chain([None], ixs), chain(ixs, [None])):
        yield xs[start:end]
import random
a = np.arange(100)
np.random.shuffle(a)
ind = sorted(random.sample(range(len(a)),k=np.random.randint(low=1,high=10)))
np.split(a,ind)



  [array([41, 19, 85, 51, 34]),
 array([71, 27]),
 array([36, 16, 18, 74, 43, 96, 45, 97, 54, 75, 89, 48, 33, 32, 63, 98,  5,
        80, 30, 17, 86, 14, 67]),
 array([ 9, 70, 84, 99, 39]),
 array([59, 20, 78, 61, 49, 37, 93]),
 array([ 1, 79, 81, 69, 40, 42, 29,  8,  3, 68, 87, 66,  4, 21, 91, 92, 31]),
 array([83, 15, 56,  2, 64, 95, 12,  0, 90, 77, 57, 60, 38, 76, 94, 22, 24,
         6, 46, 65, 50, 62, 28, 44, 73, 13, 26, 72,  7, 53, 82, 47, 58, 35,
        52, 25, 88, 11, 10, 55, 23])]