Python 查找字符串中第一个非重复字符的最佳方法
对于像Python 查找字符串中第一个非重复字符的最佳方法,python,algorithm,Python,Algorithm,对于像aabccbe这样的字符串,找到第一个非重复字符的最佳空间和时间效率解决方案是什么 这里的答案是d。因此,让我印象深刻的是,这可以通过两种方式实现: 对于每个索引,我循环i-1次,并检查该字符是否再次出现。但这是无效的:此方法的增长为O(N^2),其中N是字符串的长度 另一个可能的好方法是,如果我可以形成一棵树或任何其他ds,这样我就可以根据权重(出现次数)对角色进行排序。这可能只需要一个长度为N的循环通过字符串来形成结构。这就是O(N)+O(构建树或任何ds的时间) 如果字符只出现一次,
aabccbe
这样的字符串,找到第一个非重复字符的最佳空间和时间效率解决方案是什么
这里的答案是d。因此,让我印象深刻的是,这可以通过两种方式实现:
如果字符只出现一次,列表理解将按其出现的顺序为您提供字符:
In [61]: s = 'aabccbdcbe'
In [62]: [a for a in s if s.count(a) == 1]
Out[62]: ['d', 'e']
然后只需返回此字段的第一个条目:
In [63]: [a for a in s if s.count(a) == 1][0]
Out[63]: 'd'
如果只需要第一个条目,生成器也可以工作:
In [69]: (a for a in s if s.count(a) == 1).next()
Out[69]: 'd'
所以从问题的定义来看,很明显你需要一个O(n)的解决方案,这意味着只需要浏览列表一次。所有使用计数形式的解决方案都是错误的,因为它们在该操作中再次遍历列表。因此,你需要自己记录计数 如果该字符串中只有字符,则无需担心存储问题,只需将该字符用作dict中的键即可。该dict中的值将是字符串s中字符的索引。最后,我们必须通过计算字典值的最小值来确定哪一个是第一个。这是一个比第一个短(可能)的列表上的O(n)操作 总数仍然是O(c*n),因此是O(n)
我认为从字符串中删除重复字符可以显著减少操作数量。例如:
s = "aabccbdcbe"
while s != "":
slen0 = len(s)
ch = s[0]
s = s.replace(ch, "")
slen1 = len(s)
if slen1 == slen0-1:
print ch
break;
else:
print "No answer"
下面是一种使用
good
字符集和bad
字符集(出现多次)的方法:
我将其与LC方法进行了比较,大约50个字符左右,good
和bad
集合方法变得更快。此方法与LC的比较:
collections.Counter
有效计数(*)和collections.OrderedDict
记住项目首次出现的顺序。让我们使用多重继承来结合这些好处:
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
def first_unique(iterable):
c = OrderedCounter(iterable)
for item, count in c.iteritems():
if count == 1:
return item
print first_unique('aabccbdcbe')
#d
print first_unique('abccbdcbe')
#a
计数器
使用其超类dict
存储计数。定义类OrderedCounter(Counter,OrderedDict)
以方法解析顺序在计数器
和dict
之间插入OrderedDict
,增加了记忆插入顺序的能力
(*)从这个意义上讲,这是O(n)和有效的,但不是最快的解决方案,正如基准测试所示。搜索速度取决于几个因素:
- 绳子的长度
- 在此之前没有一次性出现字符的位置
- 此位置后字符串的大小
- 字符串中出现的不同字符数
s
借助于
random.choice()
和一组名为unik
的一次性字符,从我连接的两个字符串
s1
和s2
:s1+s2
其中:
是长度为s1
的字符串,其中没有任何一次性出现的字符nwo
是长度为s2
的字符串,其中包含一次出现的字符nwi
通过改变数值
nwo
和nwi
我们可以看到对速度的影响:
### benchmark of three solutions #################
from time import clock
# Janne Karila
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
te = clock()
c = OrderedCounter(s)
rjk = (item for item, count in c.iteritems() if count == 1).next()
tf = clock()-te
print 'Janne Karila %.5f found: %s' % (tf,rjk)
# eyquem
te = clock()
candidates = set(s)
li = []
for x in s:
if x in candidates:
li.append(x)
candidates.remove(x)
elif x in li:
li.remove(x)
rey = li[0]
tf = clock()-te
print 'eyquem %.5f found: %s' % (tf,rey)
# TyrantWave
te = clock()
rty = (a for a in s if s.count(a) == 1).next()
tf = clock()-te
print 'TyrantWave %.5f found: %s' % (tf,rty)
一些结果
对于长度为空的s1
,nwo=0且nwi=50:
s1 == '' len(s2) == 50
- s2.count(e) == 1
- s2.count(k) == 1
- s2.count(p) == 1
- s2.count(r) == 1
- s2.count(w) == 1
len of s == 50
Janne Karila 0.00077 found: e
eyquem 0.00013 found: e
TyrantWave 0.00005 found: e
TyrantWave的解决方案速度更快,因为在字符串的第一个位置可以快速找到第一个出现的字符
nwo=300和nwi=50时(以下为
s1
的401个字符,因为在构建s1
的过程中,不会保留一次出现的字符,请参见函数,不带()
这次TyrantWave的解决方案比我的解决方案要长,因为它必须计算s
第一部分中所有字符的出现次数,也就是说在s1
中没有一次性出现的字符(它们在第二部分s2
)但是,为了使我的解决方案的时间更短,
nwo
需要明显大于nwi
nwo=300和nwi=5000时
s1 : 240 chars s2 : 5000 chars
s1.count(e) == 0 s2.count(e) == 1
s1.count(k) == 0 s2.count(k) == 1
s1.count(p) == 0 s2.count(p) == 1
s1.count(r) == 0 s2.count(r) == 1
s1.count(w) == 0 s2.count(w) == 1
s1.count(others)>1 True
len of s == 5240
Janne Karila 0.01510 found: p
eyquem 0.00534 found: p
TyrantWave 0.00294 found: p
如果提高了s2
的长度,那么泰兰德瓦夫的解决方案也会更好
得出你想要的结论
编辑
罗马的好主意我在基准测试中添加了Roman的解决方案,它赢了 我还做了一些微小的修改来改进他的解决方案
# Roman Fursenko
srf = s[:]
te = clock()
while srf != "":
slen0 = len(srf)
ch = srf[0]
srf = srf.replace(ch, "")
slen1 = len(srf)
if slen1 == slen0-1:
rrf = ch
break
else:
rrf = "No answer"
tf = clock()-te
print 'Roman Fursenko %.6f found: %s' % (tf,rrf)
# Roman Fursenko improved
srf = s[:]
te = clock()
while not(srf is ""):
slen0 = len(srf)
srf = srf.replace(srf[0], "")
if len(srf) == slen0-1:
rrf = ch
break
else:
rrf = "No answer"
tf = clock()-te
print 'Roman improved %.6f found: %s' % (tf,rrf)
print '\nindex of %s in the string : %d' % (rty,s.index(rrf))
结果是:
>
多亏了罗马法典,我学到了一些东西:s.replace()
但是,我不知道是什么原因,这是一种非常快速的方法
编辑2
Oin的解决方案最差:
# Oin
from operator import itemgetter
seen = set()
only_appear_once = dict()
te = clock()
for i, x in enumerate(s):
if x in seen and x in only_appear_once:
only_appear_once.pop(x)
else:
seen.add(x)
only_appear_once[x] = i
fco = min(only_appear_once.items(),key=itemgetter(1))[0]
tf = clock()-te
print 'Oin %.7f found: %s' % (tf,fco)
结果
s1 == '' len(s2) == 50
Oin 0.0007124 found: e
Janne Karila 0.0008057 found: e
eyquem 0.0001252 found: e
TyrantWave 0.0000712 found: e
Roman Fursenko 0.0000335 found: e
Roman improved 0.0000335 found: e
index of e in the string : 2
s1 : 237 chars s2 : 50 chars
Oin 0.0029783 found: k
Janne Karila 0.0014714 found: k
eyquem 0.0002889 found: k
TyrantWave 0.0005598 found: k
Roman Fursenko 0.0001458 found: k
Roman improved 0.0001372 found: k
index of k in the string : 246
s1 : 236 chars s2 : 5000 chars
Oin 0.0801739 found: e
Janne Karila 0.0155715 found: e
eyquem 0.0044623 found: e
TyrantWave 0.0027548 found: e
Roman Fursenko 0.0007255 found: e
Roman improved 0.0007199 found: e
index of e in the string : 244
这里有一个非常简单的O(n)
解决方案:
def fn(s):
order = []
counts = {}
for x in s:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
order.append(x)
for x in order:
if counts[x] == 1:
return x
return None
我们把绳子绕了一圈。当我们遇到一个新字符时,我们将其存储在counts
中,值为1
,并将其附加到order
。当我们遇到以前见过的字符时,我们会在计数中增加它的值。最后,我们在order
中循环,直到在counts
中找到一个值为1
的字符并返回它。您不需要列表;您可以在s中执行(如果s
s1 : 240 chars s2 : 5000 chars
s1.count(e) == 0 s2.count(e) == 1
s1.count(k) == 0 s2.count(k) == 1
s1.count(p) == 0 s2.count(p) == 1
s1.count(r) == 0 s2.count(r) == 1
s1.count(w) == 0 s2.count(w) == 1
s1.count(others)>1 True
len of s == 5240
Janne Karila 0.01510 found: p
eyquem 0.00534 found: p
TyrantWave 0.00294 found: p
# Roman Fursenko
srf = s[:]
te = clock()
while srf != "":
slen0 = len(srf)
ch = srf[0]
srf = srf.replace(ch, "")
slen1 = len(srf)
if slen1 == slen0-1:
rrf = ch
break
else:
rrf = "No answer"
tf = clock()-te
print 'Roman Fursenko %.6f found: %s' % (tf,rrf)
# Roman Fursenko improved
srf = s[:]
te = clock()
while not(srf is ""):
slen0 = len(srf)
srf = srf.replace(srf[0], "")
if len(srf) == slen0-1:
rrf = ch
break
else:
rrf = "No answer"
tf = clock()-te
print 'Roman improved %.6f found: %s' % (tf,rrf)
print '\nindex of %s in the string : %d' % (rty,s.index(rrf))
s1 == '' len(s2) == 50
- s2.count(e) == 1
- s2.count(k) == 1
- s2.count(p) == 1
- s2.count(r) == 1
- s2.count(w) == 1
len of s == 50
Janne Karila 0.0032538 found: r
eyquem 0.0001249 found: r
TyrantWave 0.0000534 found: r
Roman Fursenko 0.0000299 found: r
Roman improved 0.0000263 found: r
index of r in the string : 1
s1 == '' len(s2) == 50
- s2.count(e) == 1
- s2.count(k) == 0
- s2.count(p) == 1
- s2.count(r) == 1
- s2.count(w) == 1
len of s == 50
Janne Karila 0.0008183 found: a
eyquem 0.0001285 found: a
TyrantWave 0.0000550 found: a
Roman Fursenko 0.0000433 found: a
Roman improved 0.0000391 found: a
index of a in the string : 4
s1 : 240 chars s2 : 50 chars
s1.count(e) == 0 s2.count(e) == 1
s1.count(k) == 0 s2.count(k) == 0
s1.count(p) == 0 s2.count(p) == 1
s1.count(r) == 0 s2.count(r) == 1
s1.count(w) == 0 s2.count(w) == 1
s1.count(others)>1 True
len of s == 290
Janne Karila 0.0016390 found: e
eyquem 0.0002956 found: e
TyrantWave 0.0004112 found: e
Roman Fursenko 0.0001428 found: e
Roman improved 0.0001277 found: e
index of e in the string : 242
s1 : 241 chars s2 : 5000 chars
s1.count(e) == 0 s2.count(e) == 1
s1.count(k) == 0 s2.count(k) == 1
s1.count(p) == 0 s2.count(p) == 1
s1.count(r) == 0 s2.count(r) == 1
s1.count(w) == 0 s2.count(w) == 1
s1.count(others)>1 True
len of s == 5241
Janne Karila 0.0148231 found: r
eyquem 0.0053283 found: r
TyrantWave 0.0030166 found: r
Roman Fursenko 0.0007414 found: r
Roman improved 0.0007230 found: r
index of r in the string : 250
# Oin
from operator import itemgetter
seen = set()
only_appear_once = dict()
te = clock()
for i, x in enumerate(s):
if x in seen and x in only_appear_once:
only_appear_once.pop(x)
else:
seen.add(x)
only_appear_once[x] = i
fco = min(only_appear_once.items(),key=itemgetter(1))[0]
tf = clock()-te
print 'Oin %.7f found: %s' % (tf,fco)
s1 == '' len(s2) == 50
Oin 0.0007124 found: e
Janne Karila 0.0008057 found: e
eyquem 0.0001252 found: e
TyrantWave 0.0000712 found: e
Roman Fursenko 0.0000335 found: e
Roman improved 0.0000335 found: e
index of e in the string : 2
s1 : 237 chars s2 : 50 chars
Oin 0.0029783 found: k
Janne Karila 0.0014714 found: k
eyquem 0.0002889 found: k
TyrantWave 0.0005598 found: k
Roman Fursenko 0.0001458 found: k
Roman improved 0.0001372 found: k
index of k in the string : 246
s1 : 236 chars s2 : 5000 chars
Oin 0.0801739 found: e
Janne Karila 0.0155715 found: e
eyquem 0.0044623 found: e
TyrantWave 0.0027548 found: e
Roman Fursenko 0.0007255 found: e
Roman improved 0.0007199 found: e
index of e in the string : 244
def fn(s):
order = []
counts = {}
for x in s:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
order.append(x)
for x in order:
if counts[x] == 1:
return x
return None