Python 正在使用；不在「；“比使用更快”；在；在蟒蛇3中？_Python_Python 3.x

Python 正在使用；不在「；“比使用更快”；在；在蟒蛇3中？

python python-3.x

Python 正在使用；不在「；“比使用更快”；在；在蟒蛇3中？,python,python-3.x,Python,Python 3.x,假设我们正在解决一个简单的字数问题。有一个列表，我们试图找到列表中出现的每个单词的字数。这里哪种模式更快 book_title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin'] word_count_dict = {} 模式1 模式2 您可以检查两种模式的操

假设我们正在解决一个简单的字数问题。有一个列表，我们试图找到列表中出现的每个单词的字数。这里哪种模式更快

book_title =  ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_count_dict = {}

模式1 模式2

您可以检查两种模式的操作时间，并进行比较

import timeit

def pattern1():
  title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
  counts = {}
  for word in title:
    if word in counts:
      counts[word] += 1
    else:
      counts[word] = 1

def pattern2():
  title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
  counts = {}
  for word in title:
    if word not in counts:
      counts[word] = 1
    else:
      counts[word] += 1

sample1 = [timeit.timeit(pattern1, number=10000) for _ in range(10)]
sample2 = [timeit.timeit(pattern2, number=10000) for _ in range(10)]

print(sum(sample1) / len(sample1))
# 0.01713230140048836
print(sum(sample2) / len(sample2))
# 0.017954919600015273

正如我们所看到的，差别可以忽略不计。

基本上，它们的成本是相同的。从

运算符

不在

被定义为在中具有逆真值


一个六个，另一个半打。它们应该大致相等-在计算术语中，not
操作几乎可以忽略不计（实际上是最便宜的操作），而像字典一样的哈希表中的in
操作在恒定时间内运行（哈希存在或不存在）。如果我们处理的是一个列表，它将以线性时间运行，但仍然介于in
和not in
之间。另见
因此，基本上，请使用使您的代码更容易理解的代码

也就是说，您是否考虑过使用专门为此目的而设计的数据结构collections.Counter

import collections
book_title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_counts = collections.Counter(book_title)
print(word_counts)
# Counter({'great': 2, 'the': 2, 'adventures': 2, 'of': 2, 'expectations': 1, 'sherlock': 1, 'holmes': 1, 'gasby': 1, 'hamlet': 1, 'huckleberry': 1, 'fin': 1})

如果需要，您可以将collections.Counter
键入dict
，实际上collections.Counter
是dict
的子类。它甚至有一个.update（）
方法专门设计用于处理其他计数器-如果您添加了另一个书名，只需将其输入计数器
，然后用它来.update（）
原始计数器。
它们的成本大致相同。将not-in
运算符视为首先应用的in
运算符，然后对该结果应用逻辑not
（这几乎可以忽略不计）
为了确认，这里有一个小实验可以用来测量执行时间
从时间导入时间
书名=['hi']*100000+['there']*10000
单词计数dict1={}
单词计数dict2={}
开始=时间（）
对于书名中的单词：
如果word\u count\u dict1中的单词：
字数1[字数]+=1
其他：
字数[字数]=1
打印（时间（）-开始）
开始=时间（）
对于书名中的单词：
如果单词不在word\u count\u dict2中：
字数2[字数]=1
其他：
字数2[字数]+=1
打印（时间（）-开始）

输出（可能因您而异）
使用，您可以查看为每个方法生成的字节码
通过反汇编程序运行代码时，我能看到的唯一区别是中的和中的与中的，其中字节码的区别是：
COMPARE_OP 7（不在中）
或
COMPARE_OP 6（in）

然后，如果错误，则弹出跳转（即，继续执行此条件下的下一条指令）
总之，这两种方法似乎具有相同数量的指令，而不管比较结果返回true或false，因此很可能执行速度相同

可能会有一些较接近CPU指令的潜在优化，这会导致一种或另一种方法更快，但我认为该领域超出了这个问题的范围。如果是这样的话，那么我相信在一个更大的列表中，一个简单的执行时间度量将证明哪一个更快
在Python字节码下，这两条指令的执行速度在Python版本、版本、操作系统或体系结构之间可能有所不同。您可能能够对Python源代码进行一些小的更改，以使一条或另一条指令执行得更快。
不应该是相同的吗。两者都必须在散列图中搜索，对吗？不。它们的成本应该相同。我相信使用try-except会更快，因为我以前在某个地方读过。但不确定。您能否给出一个适用于此问题和示例输出的示例pattern1
和pattern2？否则这个答案就太宽泛了，对OP来说也不是很有用。我可以在@jayelm做这个。但是他的pattern2有一个变量名word_counter，这在问题中没有定义。所以我不确定word_counter是否是一本空字典。
import timeit

def pattern1():
  title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
  counts = {}
  for word in title:
    if word in counts:
      counts[word] += 1
    else:
      counts[word] = 1

def pattern2():
  title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
  counts = {}
  for word in title:
    if word not in counts:
      counts[word] = 1
    else:
      counts[word] += 1

sample1 = [timeit.timeit(pattern1, number=10000) for _ in range(10)]
sample2 = [timeit.timeit(pattern2, number=10000) for _ in range(10)]

print(sum(sample1) / len(sample1))
# 0.01713230140048836
print(sum(sample2) / len(sample2))
# 0.017954919600015273

import collections
book_title = ['great', 'expectations','the', 'adventures', 'of', 'sherlock','holmes','the','great','gasby','hamlet','adventures','of','huckleberry','fin']
word_counts = collections.Counter(book_title)
print(word_counts)
# Counter({'great': 2, 'the': 2, 'adventures': 2, 'of': 2, 'expectations': 1, 'sherlock': 1, 'holmes': 1, 'gasby': 1, 'hamlet': 1, 'huckleberry': 1, 'fin': 1})

0.021044015884399414
0.02713179588317871