python中numpy数组的随机大小分块
我想将索引数组分成随机大小的块(从有限的可能大小范围中选取),这些块也在彼此之间混洗。我尝试了以下我发现的方法,但它关注的是同样大小的块python中numpy数组的随机大小分块,python,arrays,numpy,Python,Arrays,Numpy,我想将索引数组分成随机大小的块(从有限的可能大小范围中选取),这些块也在彼此之间混洗。我尝试了以下我发现的方法,但它关注的是同样大小的块 a = np.arange(1, 100) def chunk(xs, n): # to chunk the array xs in n parts ys = list(xs) random.shuffle(ys) size = len(ys) // n leftovers= ys[size*n:] for c, xt
a = np.arange(1, 100)
def chunk(xs, n): # to chunk the array xs in n parts
ys = list(xs)
random.shuffle(ys)
size = len(ys) // n
leftovers= ys[size*n:]
for c, xtra in enumerate(leftovers):
yield ys[c*size:(c+1)*size] + [ xtra ]
for c in xrange(c+1,n):
yield ys[c*size:(c+1)*size]
换言之,我如何更改上述函数,使其具有一定数量的块(随机数和相互之间的随机数),并从某个范围随机获取可变大小,例如[5-10]
,这将起作用:
from itertools import chain
import numpy as np
a = np.arange(1, 100)
def chunk(xs, nlow, nhigh, shuffle=True):
xs = np.asarray(xs)
if shuffle:
# shuffle, if you want
xs = xs.copy()
np.random.shuffle(xs)
# get at least enough random chunk sizes in the specified range, ie nlow <= n <= nhigh
ns = np.random.randint(nlow, nhigh+1, size=xs.size//nlow)
# add up the chunk sizes to get the indices at which we'll slice up the input array
ixs = np.add.accumulate(ns)
# truncate ixs so that its contents are all valid indices with respect to xs
ixs = ixs[:np.searchsorted(ixs, xs.size)]
# yield slices from the input array
for start,end in zip(chain([None], ixs), chain(ixs, [None])):
yield xs[start:end]
list(chunk(a, 5, 10))
编辑
我最初的答案并没有给最后一个块的大小设置一个下限,所以有时候它会比指定的小(尽管永远不会大)。据我所知,没有直接的方法来处理这个问题。但是,一般来说,您可以通过拒绝来自该区域的任何样本,从随机分布中删除不需要的区域。换言之,您可以通过丢弃任何建议的块来确保最后一个块足够大:
def getIxs(xsize, nlow, nhigh):
# get at least enough random chunk sizes in the specified range, ie nlow <= n <= nhigh
ns = np.random.randint(nlow, nhigh+1, size=xsize//nlow)
# add up the chunk sizes to get the indices at which we'll slice up the input array
ixs = np.add.accumulate(ns)
# truncate ixs so that its contents are all valid indices with respect to xs
ixs = ixs[:np.searchsorted(ixs, xsize)]
return ixs
def chunk(xs, nlow, nhigh):
xs = np.asarray(xs)
ixs = getIxs(xs.size, nlow, nhigh)
# rerun getIxs until the size of the final chunk is large enough
while (xs.size - ixs[-1]) < nlow:
ixs = getIxs(xs.size, nlow, nhigh)
# yield slices from the input array
for start,end in zip(chain([None], ixs), chain(ixs, [None])):
yield xs[start:end]
def getIxs(xsize、nlow、nhigh):
#在指定范围内至少获取足够的随机块大小,即nlow您可以使用np.split(数组、索引)
你的问题还不完整。。范围是多少?它是如何计算的?假设可能的尺寸范围是5-10。这是不相关的,但我会将它添加到问题中,在这种情况下,您可能会或可能不会满足n个块数约束,对吧?是的,没错。相关变量是对块大小的限制,应在特定范围内随机选取,例如[5-10]!选项shuffle=False正是我想要的。谢谢!:)@Garini是的,我不确定你是否有意使用shuffle,或者复制的代码中是否携带了cruft。所以我选择了它,只有一个缺陷,我注意到。最后一个分区可以随机突破大小限制。有一种方法可以避免这种情况吗?@Garini通过设计(或多或少)最终的块可以小于下限。没有有效的方法来解决这个问题,但是你可以使用基于“拒绝”的方法来得到你想要的。我已经在上面的一篇编辑文章中发布了一些相关的代码。在我的例子中,我希望在不强加拆分数量的情况下,将拆分的大小保持在一个有限的范围内。另外,我需要在拆分中有有序的数字。在这种情况下,k也应该是一个随机数,但不确定,拆分中的有序数字是什么意思?即使原始数组未排序,也要对拆分进行排序吗?
def getIxs(xsize, nlow, nhigh):
# get at least enough random chunk sizes in the specified range, ie nlow <= n <= nhigh
ns = np.random.randint(nlow, nhigh+1, size=xsize//nlow)
# add up the chunk sizes to get the indices at which we'll slice up the input array
ixs = np.add.accumulate(ns)
# truncate ixs so that its contents are all valid indices with respect to xs
ixs = ixs[:np.searchsorted(ixs, xsize)]
return ixs
def chunk(xs, nlow, nhigh):
xs = np.asarray(xs)
ixs = getIxs(xs.size, nlow, nhigh)
# rerun getIxs until the size of the final chunk is large enough
while (xs.size - ixs[-1]) < nlow:
ixs = getIxs(xs.size, nlow, nhigh)
# yield slices from the input array
for start,end in zip(chain([None], ixs), chain(ixs, [None])):
yield xs[start:end]
import random
a = np.arange(100)
np.random.shuffle(a)
ind = sorted(random.sample(range(len(a)),k=np.random.randint(low=1,high=10)))
np.split(a,ind)
[array([41, 19, 85, 51, 34]),
array([71, 27]),
array([36, 16, 18, 74, 43, 96, 45, 97, 54, 75, 89, 48, 33, 32, 63, 98, 5,
80, 30, 17, 86, 14, 67]),
array([ 9, 70, 84, 99, 39]),
array([59, 20, 78, 61, 49, 37, 93]),
array([ 1, 79, 81, 69, 40, 42, 29, 8, 3, 68, 87, 66, 4, 21, 91, 92, 31]),
array([83, 15, 56, 2, 64, 95, 12, 0, 90, 77, 57, 60, 38, 76, 94, 22, 24,
6, 46, 65, 50, 62, 28, 44, 73, 13, 26, 72, 7, 53, 82, 47, 58, 35,
52, 25, 88, 11, 10, 55, 23])]