Python 如何有效地计算字符串中字符频率的前缀和？_Python_Python 3.x_String

Python 如何有效地计算字符串中字符频率的前缀和？

python python-3.x string

Python 如何有效地计算字符串中字符频率的前缀和？,python,python-3.x,string,Python,Python 3.x,String,喂，我有一根绳子 s = 'AAABBBCAB' 如何有效地计算字符串中每个字符的频率前缀和，即： psum = [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2}, {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}, {'A': 4, 'B': 4, 'C': 1}] 这是一个选项： from collections imp

喂，我有一根绳子

s = 'AAABBBCAB'

如何有效地计算字符串中每个字符的频率前缀和，即：

psum = [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2}, {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}, {'A': 4, 'B': 4, 'C': 1}]

这是一个选项：

from collections import Counter

c = Counter()
s = 'AAABBBCAB'

psum = []
for char in s:
    c.update(char)
    psum.append(dict(c))

# [{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, {'A': 3, 'B': 2}, 
#  {'A': 3, 'B': 3}, {'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1},
#  {'A': 4, 'B': 4, 'C': 1}]

我使用以保留“运行总和”，并将（结果的副本）添加到列表

psum

。这样，我只在字符串

上迭代一次

如果希望在结果中包含对象，可以将最后一行更改为

psum.append(c.copy())

为了得到

[Counter({'A': 1}), Counter({'A': 2}), ...
 Counter({'A': 4, 'B': 4, 'C': 1})]

同样的结果也可以通过这种方式实现（使用是第一次提出的；我只是避免使用

map

，而是使用生成器表达式）：

只是为了完整性（因为这里还没有“纯

dict

”答案）。如果您不想使用

计数器

或

defaultdict

，您也可以使用：

c = {}
s = 'AAABBBCAB'

psum = []
for char in s:
    c[char] = c.get(char, 0) + 1
    psum.append(c.copy())

虽然

defaultdict

通常比

dict.get（key，default）

性能更好，但最简单的方法是使用集合中的计数器对象

from collections import Counter

s = 'AAABBBCAB'

[ dict(Counter(s[:i]) for i in range(1,len(s))]

收益率：

[{'A': 1},  {'A': 2},  {'A': 3},  {'A': 3, 'B': 1},  {'A': 3, 'B': 2},
{'A': 3, 'B': 3},  {'A': 3, 'B': 3, 'C': 1},  {'A': 4, 'B': 3, 'C': 1}]

您可以使用和在一行中完成此操作：

这将为您提供

计数器

对象的列表。现在，要在O（1）时间内获得

的任何子串的频率，只需减去计数器，例如：

>>> psum[6] - psum[1]  # get frequencies for s[2:7]
Counter({'B': 3, 'A': 1, 'C': 1})

实际上，你甚至不需要计数器，只要一个defaultdict就足够了

from collections import defaultdict

c = defaultdict(int)
s = 'AAABBBCAB'

psum = []

#iterate through the character
for char in s:
    #Update count for each character
    c[char] +=1
    #Add the updated dictionary to the output list
    psum.append(dict(c))

print(psum)

输出看起来像

[{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, 
{'A': 3, 'B': 2}, {'A': 3, 'B': 3}, 
{'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}, 
{'A': 4, 'B': 4, 'C': 1}]

在Python 3.8中，可以将列表理解与（又称“walrus操作符”）一起使用：

需要注意的是，

计数器

是

dict

的一个子类，因此没有什么理由用普通的

dict

替换

计数器。我同意，但它更符合用户指定的输出。我会自己保留计数器对象，因为它们除了作为dict之外还有有用的函数。这是一个优雅的1-线性so+1，但是是二次的而不是线性的。我怀疑hiro Protation的类似解决方案更有效。最后，你想要一个dict，或者你想要阅读时每个字符的dict列表？@Vanjith我想要一个字符频率的运行计数器。我们甚至不需要计数器
，这里，一个简单的defaultdict
就可以了@hiro Protation，检查我下面的答案！是什么让你说defaultdict
比计数器更“简单”？以什么方式更简单？@deveshkumarsing他们都是dict的子类；计数器的数据结构并不比dict的复杂。或者我遗漏了什么？@DeveshKumarSingh，这些考虑是错误的。我已经指出了时间性能的差异，但OP应该自己做出决定。@DeveshKumarSingh：你的答案比这个来得晚，它是完全相同的结构，但类型略有不同，它具有相同的复杂性，但输出更详细。你不应该在这里做广告。
from collections import defaultdict

c = defaultdict(int)
s = 'AAABBBCAB'

psum = []

#iterate through the character
for char in s:
    #Update count for each character
    c[char] +=1
    #Add the updated dictionary to the output list
    psum.append(dict(c))

print(psum)

[{'A': 1}, {'A': 2}, {'A': 3}, {'A': 3, 'B': 1}, 
{'A': 3, 'B': 2}, {'A': 3, 'B': 3}, 
{'A': 3, 'B': 3, 'C': 1}, {'A': 4, 'B': 3, 'C': 1}, 
{'A': 4, 'B': 4, 'C': 1}]

>>> from collections import Counter
>>> s = 'AAABBBCAB'
>>> c = Counter()
>>> [c := c + Counter(x) for x in s]
[Counter({'A': 1}), Counter({'A': 2}), Counter({'A': 3}), Counter({'A': 3, 'B': 1}), Counter({'A': 3, 'B': 2}), Counter({'A': 3, 'B': 3}), Counter({'A': 3, 'B': 3, 'C': 1}), Counter({'A': 4, 'B': 3, 'C': 1}), Counter({'A': 4, 'B': 4, 'C': 1})]