Python 从列表中的每个`m`元素中抽取`i`th`n`元素的有效一行程序_Python_Arrays_Python 2.7_List Comprehension_Multiplexing

Python 从列表中的每个`m`元素中抽取`i`th`n`元素的有效一行程序

python arrays python-2.7

Python 从列表中的每个`m`元素中抽取`i`th`n`元素的有效一行程序,python,arrays,python-2.7,list-comprehension,multiplexing,Python,Arrays,Python 2.7,List Comprehension,Multiplexing,我正在寻找一个内存/cpu效率高的一行程序，以从列表中的每m个元素中抽取n个子样本。到目前为止，我得到了： sb = [11,12,21,22,31,32]*4 #stream buffer, e.g. 4 identical frames ci = 1 #1-indexed channel index cs = 2 #channel (sample) size nc = 3 #number of channels in each frame fs = nc*cs #frame size [i

我正在寻找一个内存/cpu效率高的一行程序，以从列表中的每m个元素中抽取n个子样本。到目前为止，我得到了：

sb = [11,12,21,22,31,32]*4 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel (sample) size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size

[i for l in [sb[j+ci-1:j+ci-1+cs] for j
    in [x*fs+ci-1 for x in xrange(len(sb)/fs)]] for i in l]

Out: [11, 12, 11, 12, 11, 12, 11, 12]

分解它，我正在创建一个样本列表列表，然后用

[I代表l代表l代表I代表l]

或者，不是一行，但更容易阅读，我可以做：

os = []
for i in [sb[j+ci-1:j+ci-1+cs] for j in [x*fs+ci-1 for x in xrange(len(sb)/fs)]]: os = os+i

这两种解决方案在比较时看起来都太复杂了，例如，对于

cs=1

特例：

sb[ci-1:：fs]

你能帮我想出一个像样的解决方案吗？

我把大部分索引移到了

范围（）

计算中。它比将索引显示到子列表更快-请参阅下面的计时：

sb = [11,12,21,22,31,32]*4 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size

for ci in range(1,4):
    print [x for y in [sb[x:x+cs] for x in range((ci-1)*cs,len(sb),fs)] for x in y]

输出：

[11, 12, 11, 12, 11, 12, 11, 12]
[21, 22, 21, 22, 21, 22, 21, 22]
[31, 32, 31, 32, 31, 32, 31, 32]

我将大部分工作移到了子列表的

range（）

call-producting列表中，其余工作是将子列表简单分解为一个列表

range((ci-1)*cs,len(sb), fs)
         |         |     |________  frame size, range will use steps the size of the frame
         |         |______________  till end of data
         |________________________  starting at (ci-1) * channel size   

for ci = 1 it starts at 0,   6,12,18,....
for ci = 2 it starts at 2,   8,14,....
for ci = 3 it starts at 4,  10,...  
for ci = 4 it starts at 6,  ...  
    and increases by fs = 6 until end of data. The list comp then gets a sublist of len cs
    and the rest of the list-comp flattens it down from list of list to a simpler list

时间：

import timeit

print timeit.timeit(stmt='''
sb = [11,12,21,22,31,32]*4*5 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
for ci in range(1,4):
    [x for y in [sb[x:x+cs] for x in range((ci-1)*cs,len(sb),fs)] for x in y] 

''', setup='pass', number=10000)  #  0.588474035263

print timeit.timeit(stmt='''
sb = [11,12,21,22,31,32]*4*5 #stream buffer, e.g. 4 identical frames
ci = 1 #1-indexed channel index
cs = 2 #channel size
nc = 3 #number of channels in each frame
fs = nc*cs #frame size
for ci in range(1,4):
    [i for l in [sb[j+ci-1:j+ci-1+cs] for j in [x*fs+ci-1 for x in xrange(len(sb)/fs)]] for i in l] 

''', setup='pass', number=10000)   # 0.734045982361

代码：

sb = [11,12,21,22,31,32] * 4
ci = 0
cs = 2
nc = 3
fs = cs * nc
result = list(sum(zip(*[sb[i::fs] for i in range(ci, ci+cs)]),()))

[11, 12, 11, 12, 11, 12, 11, 12]

输出：

sb = [11,12,21,22,31,32] * 4
ci = 0
cs = 2
nc = 3
fs = cs * nc
result = list(sum(zip(*[sb[i::fs] for i in range(ci, ci+cs)]),()))

[11, 12, 11, 12, 11, 12, 11, 12]

我建议将

ci

设置为基于0的索引以匹配python的语法，但如果您坚持，更新func很简单，只需将所有

ci

替换为

ci-1

它本质上与您最初的方法相同，只是稍微干净了一点，并且它可以扩展到不同的

ci

、

cs

和

nc

以下内容对我来说似乎相当易懂（而且也相当有效）：

我建议使用更清晰的变量名，而不是注释，并且不要使用一行程序

给定的

import itertools as it 


stream = [11, 12, 21, 22, 31, 32] * 4
ch_idx = 1
ch_size = 2 
num_chs = 3

代码

使用

grouper

：

作为一个单行程序，它如下所示：

list(it.chain.from_iterable(*it.islice(zip(*grouper(num_chs, grouper(ch_size, stream))), ch_idx)))
# [11, 12, 11, 12, 11, 12, 11, 12]

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

详细信息

grouper

配方的实现如下：

list(it.chain.from_iterable(*it.islice(zip(*grouper(num_chs, grouper(ch_size, stream))), ch_idx)))
# [11, 12, 11, 12, 11, 12, 11, 12]

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

另请参见第三方库，了解预实施的配方。

是否需要随机抽样

random.sample（[11,12,21,22,31,32]*4,8）

将从列表中抽取8个随机样本，而不重复任何抽样索引-您的解决方案看起来更像是一个线性的每n次子列表sampling@Patrick阿特纳：不。我需要从流中提取一个通道（通道解复用）。流在元素列表上展开。每个通道每采样一个cs连续元素。流上的每个帧包含nc连续采样，每个通道一个。我想将流中特定通道的每个样本提取到一个新列表中，当然要保持顺序。您是否试图从流缓冲区中获取相同帧的前n个数，在本例中，这是4个相同帧中每个帧的2个元素？@Idlehands:或多或少。是，在这种情况下，4个相同帧中的每个帧中都有2个元素，来自所选通道。（对第一条评论感到抱歉，这是我误解了你写的内容）。我更喜欢你的方式，因为它更可读一点，但本质上是相同的解决方案，对吗？它仍然太复杂，不能提高性能。有没有更简单的方法？@NotGaeL本质上我通过在高度优化的生成器range（）对象中进行索引数学运算来摆脱一个显式列表，因此如果“应该”性能更好。你需要根据你的数据进行测量以确保。@NotGaeL将它提高到*20并执行了10000次，我的版本几乎快了20%，这比我想象的要多。它仍然让我痛苦，因为它很难阅读，但肯定是一个很大的进步，谢谢你的帮助！如果从执行中删除昂贵的数据设置，则计时部分将更改为0.281064033508 vs 0.385651111603-约占解决方案的72.88%。这不会产生该输出，而且它不是单个列表，而是列表列表。好的，让我看一下。我仍在试图解释

ci

应该是什么。这就是为什么我的评论会询问您是否只在给定帧大小中查找n个元素。ci是1索引通道indexif

ci==1和cs==2

，然后从列表中获取

nc*cs

元素的每个帧上的第一个和第二个元素。如果

ci==2和cs==2

那么您将从每个帧中获取第三个和第四个元素，依此类推……使用

list（it.chain（*results））

我也在研究itertools，如果您熟悉它，它会更清晰，因此是一个不错的选择。尽管如此，我还是希望从一个库中得到一个更简单的解决方案，它可以帮助您进行迭代。这是我试图解决的一个如此罕见的问题吗？@NotGaeL你能描述一下你所寻求的简单性吗？NotGaeL：你可能没有意识到，但是传递给

链的参数。from_iterable（）

就是所谓的a，这意味着它也在以迭代的方式执行。这就是我所说的代码相当高效的部分意思。主要的一点是，Python以迭代方式完成某些事情，而无需使用

itertools

来实现。听起来你希望一些现有的库函数能完成你想做的所有专门的工作。@pylang:

sb[ci-1:：fs]

@martineau:我希望一个itertools专门的库能提供迭代工具来简化一些专门的迭代工作（迭代并从每个

元素中选择

元素）。这是一项如此简单的任务，甚至自然语言在一句话中就能完成。我不想让讨论沦为讽刺，我的意思是，这正是一个致力于促进元素迭代的整个库所期望的。