Python 查找n'的索引；列表中的第项_Python_Arrays_Performance_Numpy_Indexing

Python 查找n'的索引；列表中的第项

python arrays performance numpy indexing

Python 查找n'的索引；列表中的第项,python,arrays,performance,numpy,indexing,Python,Arrays,Performance,Numpy,Indexing,我想找到列表中某个项目第n次出现的索引。e、 g x=[False,True,True,False,True,False,True,False,False,False,True,False,True] 第n个真的指数是多少？如果我想要第五次出现（第四次，如果索引为零），答案是10 我想到了： indargs = [ i for i,a in enumerate(x) if a ] indargs[n] 请注意，x.index返回第一次出现或某个点后的第一次出现，因此据我所知，这不是一个解决方

我想找到列表中某个项目第n次出现的索引。e、 g

x=[False,True,True,False,True,False,True,False,False,False,True,False,True]

第n个真的指数是多少？如果我想要第五次出现（第四次，如果索引为零），答案是10

我想到了：

indargs = [ i for i,a in enumerate(x) if a ]
indargs[n]

请注意，

x.index

返回第一次出现或某个点后的第一次出现，因此据我所知，这不是一个解决方案

对于类似上述情况，numpy中还有一种解决方案，例如使用

cumsum

和

where

，但我想知道是否有一种无numpy的方法来解决此问题

我担心性能，因为我第一次遇到这个问题是在为一个问题实施埃拉托斯烯筛选时，但这是我在其他情况下遇到的一个更普遍的问题

编辑：我得到了很多很好的答案，所以我决定做一些性能测试。下面是带有

len

neelements的列表搜索第4000个/1000个True的

timeit

执行时间（以秒为单位）。这些列表是随机的真/假。源代码链接如下；有点乱。我使用了海报名称的简短/修改版本来描述功能，除了上面简单的列表理解

listcomp

True Test (100'th True in a list containing True/False)
         nelements      eyquem_occur eyquem_occurrence            graddy            taymon          listcomp       hettinger26         hettinger
             3000:          0.007824          0.031117          0.002144          0.007694          0.026908          0.003563          0.003563
            10000:          0.018424          0.103049          0.002233          0.018063          0.088245          0.003610          0.003769
            50000:          0.078383          0.515265          0.002140          0.078074          0.442630          0.003719          0.003608
           100000:          0.152804          1.054196          0.002129          0.152691          0.903827          0.003741          0.003769
           200000:          0.303084          2.123534          0.002212          0.301918          1.837870          0.003522          0.003601
True Test (1000'th True in a list containing True/False)
         nelements      eyquem_occur eyquem_occurrence            graddy            taymon          listcomp       hettinger26         hettinger
             3000:          0.038461          0.031358          0.024167          0.039277          0.026640          0.035283          0.034482
            10000:          0.049063          0.103241          0.024120          0.049383          0.088688          0.035515          0.034700
            50000:          0.108860          0.516037          0.023956          0.109546          0.442078          0.035269          0.035373
           100000:          0.183568          1.049817          0.024228          0.184406          0.906709          0.035135          0.036027
           200000:          0.333501          2.141629          0.024239          0.333908          1.826397          0.034879          0.036551
True Test (20000'th True in a list containing True/False)
         nelements      eyquem_occur eyquem_occurrence            graddy            taymon          listcomp       hettinger26         hettinger
             3000:          0.004520          0.004439          0.036853          0.004458          0.026900          0.053460          0.053734
            10000:          0.014925          0.014715          0.126084          0.014864          0.088470          0.177792          0.177716
            50000:          0.766154          0.515107          0.499068          0.781289          0.443654          0.707134          0.711072
           100000:          0.837363          1.051426          0.501842          0.862350          0.903189          0.707552          0.706808
           200000:          0.991740          2.124445          0.498408          1.008187          1.839797          0.715844          0.709063
Number Test (750'th 0 in a list containing 0-9)
         nelements      eyquem_occur eyquem_occurrence            graddy            taymon          listcomp       hettinger26         hettinger
             3000:          0.026996          0.026887          0.015494          0.030343          0.022417          0.026557          0.026236
            10000:          0.037887          0.089267          0.015839          0.040519          0.074941          0.026525          0.027057
            50000:          0.097777          0.445236          0.015396          0.101242          0.371496          0.025945          0.026156
           100000:          0.173794          0.905993          0.015409          0.176317          0.762155          0.026215          0.026871
           200000:          0.324930          1.847375          0.015506          0.327957          1.536012          0.027390          0.026657

Hettinger的itertools解决方案几乎总是最好的。taymon和graddy的解决方案在大多数情况下都是次优的，不过当您需要第n个实例，使n高或出现次数少于n次的列表时，列表理解方法可以更好地用于短数组。如果出现次数少于n次，则初始

count

检查可节省时间。此外，graddy's在搜索数字时比True/False更有效。。。不清楚为什么会这样。eyquem的解决方案基本上与其他解决方案相当，开销稍有增加或减少；eyquem_事件与taymon的解决方案大致相同，而eyquem_事件与listcomp类似。

我不能肯定这是最快的方法，但我想这会很好：

i = -1
for j in xrange(n):
    i = x.index(True, i + 1)

答案是

如果效率是一个问题，我认为最好是迭代通常的（O（N））而不是列表理解，其中L是列表的长度

示例：考虑一个非常大的列表，并且你想找到第一个出现N＝1，显然当你发现第一个出现

时，最好停止它。

count = 0
for index,i in enumerate(L):
    if i:
        count = count + 1
        if count==N:
            return index

如果你关心性能，最好看看是否有算法优化。例如，如果对相同的值多次调用此函数，则可能希望缓存以前的计算（例如，一旦找到某个元素的第50个匹配项，就可以在

O（1）

time中找到以前的任何匹配项）

否则，您需要确保您的技术在（惰性）迭代器上工作

我能想到的实现它的最优雅、性能最好的方式是：

def indexOfNthOccurrence(N, element, stream):
    """for N>0, returns index or None"""
    seen = 0
    for i,x in enumerate(stream):
        if x==element:
            seen += 1
            if seen==N:
                return i

（如果您真的关心enumerate和其他技术之间的性能差异，那么您将需要求助于分析，尤其是对于numpy函数，它可能求助于C）

要预处理整个流并支持

O（1）

查询，请执行以下操作：

from collections import *
cache = defaultdict(list)
for i,elem in enumerate(YOUR_LIST):
    cache[elem] += [i]

# e.g. [3,2,3,2,5,5,1]
#       0 1 2 3 4 5 6
# cache: {3:[0,2], 1:[6], 2:[1,3], 5:[4,5]}

@Taymon使用list.index给出的答案非常好

FWIW，这里是一个使用。它适用于任何可编辑的输入，而不仅仅是列表：

>>> from itertools import compress, count, imap, islice
>>> from functools import partial
>>> from operator import eq

>>> def nth_item(n, item, iterable):
        indicies = compress(count(), imap(partial(eq, item), iterable))
        return next(islice(indicies, n, None), -1)

这个例子很好，因为它展示了如何有效地结合Python的函数工具集。请注意，一旦管道被设置好，Python的eval循环就不会出现问题——一切都是以C的速度完成的，内存占用很小，延迟求值，没有变量分配，并且有单独的可测试组件。瞧，这是函数式程序员梦想的一切：-）

样本运行：

>>> x = [False,True,True,False,True,False,True,False,False,False,True,False,True]
>>> nth_item(50, True, x)
-1
>>> nth_item(0, True, x)
1
>>> nth_item(1, True, x)
2
>>> nth_item(2, True, x)
4
>>> nth_item(3, True, x)
6

注意：这里Z是第n次出现，

一个解决方案，它首先创建一个列表对象并返回该列表的第n-1个元素：functionoccurrence（）

我认为，使用生成器也是实现函数式程序员梦想的解决方案，因为我喜欢它们：functionoccurrent（）

S = 'stackoverflow.com is a fantastic amazing site'
print 'object S is string %r' % S
print "indexes of 'a' in S :",[indx for indx,elem in enumerate(S) if elem=='a']

def occurence(itrbl,x,nth):
    return [indx for indx,elem in enumerate(itrbl)
            if elem==x ][nth-1] if x in itrbl \
           else None

def occur(itrbl,x,nth):
    return (i for pos,i in enumerate(indx for indx,elem in enumerate(itrbl)
                                     if elem==x)
            if pos==nth-1).next() if x in itrbl\
            else   None

print "\noccurence(S,'a',4th) ==",occurence(S,'a',4)
print "\noccur(S,'a',4th) ==",occur(S,'a',4)

结果

object S is string 'stackoverflow.com is a fantastic amazing site'
indexes of 'a' in S : [2, 21, 24, 27, 33, 35]

occur(S,'a',4th) == 27

occurence(S,'a',4th) == 27

第二个解决方案看起来很复杂，但实际上并不复杂。它不需要完全运行iterable：一旦找到所需的事件，进程就会停止。

下面是另一种在列表中查找

的第n个

事件的方法itrbl
：
def nthoccur(nth,x,itrbl):
    count,index = 0,0
    while count < nth:
        if index > len(itrbl) - 1:
            return None
        elif itrbl[index] == x:
            count += 1
            index += 1
        else:
            index += 1
    return index - 1

def n次出现（n次，x次，itrbl）：
计数，索引=0,0
当计数小于n时：
如果索引>len（itrbl）-1：
一无所获
elif itrbl[索引]==x:
计数+=1
指数+=1
其他：
指数+=1
回报指数-1
这里有一个方法：

对于上述示例：
x=[False,True,True,False,True,False,True,False,False,False,True,False,True]

我们可以定义一个函数find_index
def find_index(lst, value, n):
    c=[]
    i=0
    for element in lst :
          if element == value :
              c .append (i)
          i+=1    
    return c[n]

如果我们应用该函数：
nth_index = find_index(x, True, 4)
print nth_index

结果是：
10

我认为这应该行得通
def get_nth_occurrence_of_specific_term(my_list, term, n):
    assert type(n) is int and n > 0
    start = -1
    for i in range(n):
        if term not in my_list[start + 1:]:
            return -1
        start = my_list.index(term, start + 1)
    return start

您可以将next
与enumerate
和生成器表达式一起使用。允许您根据需要对iterable进行切片
from itertools import islice

x = [False,True,True,False,True,False,True,False,False,False,True,False,True]

def get_nth_index(L, val, n):
    """return index of nth instance where value in list equals val"""
    return next(islice((i for i, j in enumerate(L) if j == val), n-1, n), -1)

res = get_nth_index(x, True, 3)  # 4

如果迭代器已用尽，即指定值的第n次出现不存在，next
可以返回默认值，在本例中-1
：
可以使用：
输出
4

其思想是，由于e==item和next（counter）==n）
的短路性质，表达式next（counter）==n
只有在e==item
时才被计算，因此您只计算等于item的元素。编辑：我之前的评论假设您提出了不同的问题，不是关于语法。很抱歉我不是Python高手，但似乎应该能够通过for循环计算出所需的次数，每次递增计数器。将其封装在一个while循环中。因此，尽管（对于未完成的书面答复，数量为+1）
from itertools import count

x = [False, True, True, False, True, False, True, False, False, False, True, False, True]


def nth_index(n, item, iterable):
    counter = count(1)
    return next((i for i, e in enumerate(iterable) if e == item and next(counter) == n), -1)


print(nth_index(3, True, x))

4