Python 查找字符串中第一个非重复字符的最佳方法_Python_Algorithm

Python 查找字符串中第一个非重复字符的最佳方法

python algorithm

Python 查找字符串中第一个非重复字符的最佳方法,python,algorithm,Python,Algorithm,对于像aabccbe这样的字符串，找到第一个非重复字符的最佳空间和时间效率解决方案是什么这里的答案是d。因此，让我印象深刻的是，这可以通过两种方式实现：对于每个索引，我循环i-1次，并检查该字符是否再次出现。但这是无效的：此方法的增长为O（N^2），其中N是字符串的长度另一个可能的好方法是，如果我可以形成一棵树或任何其他ds，这样我就可以根据权重（出现次数）对角色进行排序。这可能只需要一个长度为N的循环通过字符串来形成结构。这就是O（N）+O（构建树或任何ds的时间）如果字符只出现一次，

对于像

aabccbe

这样的字符串，找到第一个非重复字符的最佳空间和时间效率解决方案是什么

这里的答案是d。因此，让我印象深刻的是，这可以通过两种方式实现：

对于每个索引，我循环i-1次，并检查该字符是否再次出现。但这是无效的：此方法的增长为O（N^2），其中N是字符串的长度

另一个可能的好方法是，如果我可以形成一棵树或任何其他ds，这样我就可以根据权重（出现次数）对角色进行排序。这可能只需要一个长度为N的循环通过字符串来形成结构。这就是O（N）+O（构建树或任何ds的时间）

如果字符只出现一次，列表理解将按其出现的顺序为您提供字符：

In [61]: s = 'aabccbdcbe'

In [62]: [a for a in s if s.count(a) == 1]
Out[62]: ['d', 'e']

然后只需返回此字段的第一个条目：

In [63]: [a for a in s if s.count(a) == 1][0]
Out[63]: 'd'

如果只需要第一个条目，生成器也可以工作：

In [69]: (a for a in s if s.count(a) == 1).next()
Out[69]: 'd'

所以从问题的定义来看，很明显你需要一个O（n）的解决方案，这意味着只需要浏览列表一次。所有使用计数形式的解决方案都是错误的，因为它们在该操作中再次遍历列表。因此，你需要自己记录计数

如果该字符串中只有字符，则无需担心存储问题，只需将该字符用作dict中的键即可。该dict中的值将是字符串s中字符的索引。最后，我们必须通过计算字典值的最小值来确定哪一个是第一个。这是一个比第一个短（可能）的列表上的O（n）操作

总数仍然是O（c*n），因此是O（n）

我认为从字符串中删除重复字符可以显著减少操作数量。例如：

s = "aabccbdcbe"
while s != "":
    slen0 = len(s)
    ch = s[0]
    s = s.replace(ch, "")
    slen1 = len(s)
    if slen1 == slen0-1:
        print ch
        break;
else:
    print "No answer"

下面是一种使用

good

字符集和

bad

字符集（出现多次）的方法：

我将其与LC方法进行了比较，大约50个字符左右，

good

和

bad

集合方法变得更快。此方法与LC的比较：

collections.Counter

有效计数（*）和

collections.OrderedDict

记住项目首次出现的顺序。让我们使用多重继承来结合这些好处：

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    pass

def first_unique(iterable):
    c = OrderedCounter(iterable)
    for item, count in c.iteritems():
        if count == 1:
            return item

print first_unique('aabccbdcbe')
#d            
print first_unique('abccbdcbe')            
#a

计数器

使用其超类

dict

存储计数。定义

类OrderedCounter（Counter，OrderedDict）

以方法解析顺序在

计数器

和

dict

之间插入

OrderedDict

，增加了记忆插入顺序的能力

（*）从这个意义上讲，这是O（n）和有效的，但不是最快的解决方案，正如基准测试所示。

搜索速度取决于几个因素：

绳子的长度
在此之前没有一次性出现字符的位置
此位置后字符串的大小
字符串中出现的不同字符数

在下面的代码中，我首先定义一个字符串

借助于

random.choice（）

和一组名为

unik

的一次性字符，
从我连接的两个字符串

s1

和

s2

：

s1+s2

其中：

```
s1
```
是长度为
```
nwo
```
的字符串，其中没有任何一次性出现的字符
```
s2
```
是长度为
```
nwi
```
的字符串，其中包含一次出现的字符

然后是基准测试。
通过改变数值

nwo

和

nwi

我们可以看到对速度的影响：

### benchmark of three solutions #################

from time import clock


# Janne Karila
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
    pass
te = clock()
c = OrderedCounter(s)
rjk = (item for item, count in c.iteritems() if count == 1).next()
tf = clock()-te
print 'Janne Karila  %.5f    found: %s' % (tf,rjk)

# eyquem
te = clock()
candidates = set(s)
li = []
for x in s:
    if x in candidates:
        li.append(x)
        candidates.remove(x)
    elif x in li:
        li.remove(x)
rey = li[0]
tf = clock()-te
print 'eyquem        %.5f    found: %s' % (tf,rey)

# TyrantWave
te = clock()
rty = (a for a in s if s.count(a) == 1).next()
tf = clock()-te
print 'TyrantWave    %.5f    found: %s' % (tf,rty)

一些结果

对于长度为空的

s1

，nwo=0且nwi=50：

s1 == ''     len(s2) == 50
   -         s2.count(e) == 1
   -         s2.count(k) == 1
   -         s2.count(p) == 1
   -         s2.count(r) == 1
   -         s2.count(w) == 1
len of s  == 50

Janne Karila  0.00077    found: e
eyquem        0.00013    found: e
TyrantWave    0.00005    found: e

TyrantWave的解决方案速度更快，因为在字符串的第一个位置可以快速找到第一个出现的字符

nwo=300和nwi=50时
（以下为

s1

的401个字符，因为在构建

s1

的过程中，不会保留一次出现的字符，请参见函数，不带（）

这次TyrantWave的解决方案比我的解决方案要长，因为它必须计算

第一部分中所有字符的出现次数，也就是说在

s1

中没有一次性出现的字符（它们在第二部分

s2

）
但是，为了使我的解决方案的时间更短，

nwo

需要明显大于

nwi

nwo=300和nwi=5000时

s1 : 240 chars             s2 : 5000 chars
s1.count(e) == 0           s2.count(e) == 1
s1.count(k) == 0           s2.count(k) == 1
s1.count(p) == 0           s2.count(p) == 1
s1.count(r) == 0           s2.count(r) == 1
s1.count(w) == 0           s2.count(w) == 1
s1.count(others)>1 True
len of s  == 5240

Janne Karila  0.01510    found: p
eyquem        0.00534    found: p
TyrantWave    0.00294    found: p

如果提高了

s2

的长度，那么泰兰德瓦夫的解决方案也会更好

得出你想要的结论

编辑罗马的好主意
我在基准测试中添加了Roman的解决方案，它赢了

我还做了一些微小的修改来改进他的解决方案

# Roman Fursenko
srf = s[:]
te = clock()
while srf != "":
    slen0 = len(srf)
    ch = srf[0]
    srf = srf.replace(ch, "")
    slen1 = len(srf)
    if slen1 == slen0-1:
        rrf = ch
        break
else:
    rrf = "No answer"
tf = clock()-te
print 'Roman Fursenko %.6f    found: %s' % (tf,rrf)

# Roman Fursenko improved
srf = s[:]
te = clock()
while not(srf is ""):
    slen0 = len(srf)
    srf = srf.replace(srf[0], "")
    if len(srf) == slen0-1:
        rrf = ch
        break
else:
    rrf = "No answer"
tf = clock()-te
print 'Roman improved %.6f    found: %s' % (tf,rrf)

print '\nindex of %s in the string :  %d' % (rty,s.index(rrf))

结果是：

多亏了罗马法典，我学到了一些东西：

s.replace（）
但是，我不知道是什么原因，这是一种非常快速的方法

编辑2
Oin的解决方案最差：
# Oin
from operator import itemgetter
seen = set()
only_appear_once = dict()
te = clock()
for i, x in enumerate(s):
  if x in seen and x in only_appear_once:
    only_appear_once.pop(x)
  else:
    seen.add(x)
    only_appear_once[x] = i
  fco = min(only_appear_once.items(),key=itemgetter(1))[0]
tf = clock()-te
print 'Oin            %.7f    found: %s' % (tf,fco)

结果
s1 == ''     len(s2) == 50
Oin            0.0007124    found: e
Janne Karila   0.0008057    found: e
eyquem         0.0001252    found: e
TyrantWave     0.0000712    found: e
Roman Fursenko 0.0000335    found: e
Roman improved 0.0000335    found: e

index of e in the string :  2


s1 : 237 chars             s2 : 50 chars
Oin            0.0029783    found: k
Janne Karila   0.0014714    found: k
eyquem         0.0002889    found: k
TyrantWave     0.0005598    found: k
Roman Fursenko 0.0001458    found: k
Roman improved 0.0001372    found: k

index of k in the string :  246


s1 : 236 chars             s2 : 5000 chars
Oin            0.0801739    found: e
Janne Karila   0.0155715    found: e
eyquem         0.0044623    found: e
TyrantWave     0.0027548    found: e
Roman Fursenko 0.0007255    found: e
Roman improved 0.0007199    found: e

index of e in the string :  244

这里有一个非常简单的O（n）
解决方案：
def fn(s):
  order = []
  counts = {}
  for x in s:
    if x in counts:
      counts[x] += 1
    else:
      counts[x] = 1 
      order.append(x)
  for x in order:
    if counts[x] == 1:
      return x
  return None

我们把绳子绕了一圈。当我们遇到一个新字符时，我们将其存储在counts
中，值为1
，并将其附加到order
。当我们遇到以前见过的字符时，我们会在计数中增加它的值。最后，我们在order
中循环，直到在counts
中找到一个值为1
的字符并返回它。
您不需要列表；您可以在s中执行（如果s
s1 : 240 chars             s2 : 5000 chars
s1.count(e) == 0           s2.count(e) == 1
s1.count(k) == 0           s2.count(k) == 1
s1.count(p) == 0           s2.count(p) == 1
s1.count(r) == 0           s2.count(r) == 1
s1.count(w) == 0           s2.count(w) == 1
s1.count(others)>1 True
len of s  == 5240

Janne Karila  0.01510    found: p
eyquem        0.00534    found: p
TyrantWave    0.00294    found: p

# Roman Fursenko
srf = s[:]
te = clock()
while srf != "":
    slen0 = len(srf)
    ch = srf[0]
    srf = srf.replace(ch, "")
    slen1 = len(srf)
    if slen1 == slen0-1:
        rrf = ch
        break
else:
    rrf = "No answer"
tf = clock()-te
print 'Roman Fursenko %.6f    found: %s' % (tf,rrf)

# Roman Fursenko improved
srf = s[:]
te = clock()
while not(srf is ""):
    slen0 = len(srf)
    srf = srf.replace(srf[0], "")
    if len(srf) == slen0-1:
        rrf = ch
        break
else:
    rrf = "No answer"
tf = clock()-te
print 'Roman improved %.6f    found: %s' % (tf,rrf)

print '\nindex of %s in the string :  %d' % (rty,s.index(rrf))

s1 == ''     len(s2) == 50
   -         s2.count(e) == 1
   -         s2.count(k) == 1
   -         s2.count(p) == 1
   -         s2.count(r) == 1
   -         s2.count(w) == 1
len of s  == 50

Janne Karila   0.0032538    found: r
eyquem         0.0001249    found: r
TyrantWave     0.0000534    found: r
Roman Fursenko 0.0000299    found: r
Roman improved 0.0000263    found: r

index of r in the string :  1

s1 == ''     len(s2) == 50
   -         s2.count(e) == 1
   -         s2.count(k) == 0
   -         s2.count(p) == 1
   -         s2.count(r) == 1
   -         s2.count(w) == 1
len of s  == 50

Janne Karila   0.0008183    found: a
eyquem         0.0001285    found: a
TyrantWave     0.0000550    found: a
Roman Fursenko 0.0000433    found: a
Roman improved 0.0000391    found: a

index of a in the string :  4

s1 : 240 chars             s2 : 50 chars
s1.count(e) == 0           s2.count(e) == 1
s1.count(k) == 0           s2.count(k) == 0
s1.count(p) == 0           s2.count(p) == 1
s1.count(r) == 0           s2.count(r) == 1
s1.count(w) == 0           s2.count(w) == 1
s1.count(others)>1 True
len of s  == 290

Janne Karila   0.0016390    found: e
eyquem         0.0002956    found: e
TyrantWave     0.0004112    found: e
Roman Fursenko 0.0001428    found: e
Roman improved 0.0001277    found: e

index of e in the string :  242

s1 : 241 chars             s2 : 5000 chars
s1.count(e) == 0           s2.count(e) == 1
s1.count(k) == 0           s2.count(k) == 1
s1.count(p) == 0           s2.count(p) == 1
s1.count(r) == 0           s2.count(r) == 1
s1.count(w) == 0           s2.count(w) == 1
s1.count(others)>1 True
len of s  == 5241

Janne Karila   0.0148231    found: r
eyquem         0.0053283    found: r
TyrantWave     0.0030166    found: r
Roman Fursenko 0.0007414    found: r
Roman improved 0.0007230    found: r

index of r in the string :  250

# Oin
from operator import itemgetter
seen = set()
only_appear_once = dict()
te = clock()
for i, x in enumerate(s):
  if x in seen and x in only_appear_once:
    only_appear_once.pop(x)
  else:
    seen.add(x)
    only_appear_once[x] = i
  fco = min(only_appear_once.items(),key=itemgetter(1))[0]
tf = clock()-te
print 'Oin            %.7f    found: %s' % (tf,fco)

s1 == ''     len(s2) == 50
Oin            0.0007124    found: e
Janne Karila   0.0008057    found: e
eyquem         0.0001252    found: e
TyrantWave     0.0000712    found: e
Roman Fursenko 0.0000335    found: e
Roman improved 0.0000335    found: e

index of e in the string :  2


s1 : 237 chars             s2 : 50 chars
Oin            0.0029783    found: k
Janne Karila   0.0014714    found: k
eyquem         0.0002889    found: k
TyrantWave     0.0005598    found: k
Roman Fursenko 0.0001458    found: k
Roman improved 0.0001372    found: k

index of k in the string :  246


s1 : 236 chars             s2 : 5000 chars
Oin            0.0801739    found: e
Janne Karila   0.0155715    found: e
eyquem         0.0044623    found: e
TyrantWave     0.0027548    found: e
Roman Fursenko 0.0007255    found: e
Roman improved 0.0007199    found: e

index of e in the string :  244

def fn(s):
  order = []
  counts = {}
  for x in s:
    if x in counts:
      counts[x] += 1
    else:
      counts[x] = 1 
      order.append(x)
  for x in order:
    if counts[x] == 1:
      return x
  return None