优化/提高Python代码简单位的速度_Python_Performance_Optimization_Time

优化/提高Python代码简单位的速度

python performance optimization time

优化/提高Python代码简单位的速度,python,performance,optimization,time,Python,Performance,Optimization,Time,我有一个客户列表，我希望返回一个排序后的客户列表，其中出现的次数更多超过原始列表中时间的5%。下面的工作，但我需要优化它。不幸的是，我无法发现如何大幅提高时间效率。有什么建议吗 def mostActive(customers): unique_customers = set(customers) count = len(customers) result = [] for customer in unique_customers: if cus

我有一个客户列表，我希望返回一个排序后的客户列表，其中出现的次数更多超过原始列表中时间的5%。下面的工作，但我需要优化它。不幸的是，我无法发现如何大幅提高时间效率。有什么建议吗

def mostActive(customers):
    unique_customers = set(customers)
    count = len(customers)
    result = []
    for customer in unique_customers:
        if customers.count(customer) / count >= .05:
            result.append(customer)
    return sorted(result)

以下是一个可能的解决方案：

from collections import Counter

def mostActive(customers):
    return sorted(customer
                  for customer, count in Counter(customers).items()
                  if count / len(customers) >= .05)

使用collections.Counter对列表中每个元素的出现次数进行计数可以显著提高性能。事实上，在线性时间内，只计算一次事件。因此这里的复杂性是On+Onlogn排序结果现在是瓶颈。

这里有一个可能的解决方案：

from collections import Counter

def mostActive(customers):
    return sorted(customer
                  for customer, count in Counter(customers).items()
                  if count / len(customers) >= .05)

使用collections.Counter对列表中每个元素的出现次数进行计数可以显著提高性能。事实上，在线性时间内，只计算一次事件。因此，这里的复杂性是On+Onlogn排序结果现在是瓶颈。

尝试以下方法：

list(set([name for name in customers if customers.count(name)/len(customers)>=0.05]))

试试这个：

list(set([name for name in customers if customers.count(name)/len(customers)>=0.05]))

谈到性能，测试是关键。这是我机器上的运行时，其他答案中显示的一些代码的ofc，原始代码和基于numpy的答案所有代码都在相同的数据上运行：

将numpy作为np导入从收款进口柜台导入时间 random_data=np.random.randinT0100[100100000] 原始代码 def mostActivecustomers：唯一客户=设置客户计数=客户数结果=[] 对于唯一客户中的客户：如果customers.tolist.countcustomer/count>=.05：必须添加.tolist以使其与数据兼容结果：追加客户返回分拣结果开始=时间对于我在100范围内： _=MOST平均数据[i] 结束=时间 printf'Avg time:{end start*10}ms'将为/100*1000->简化为*10 平均时间：1394.4847583770752毫秒排序+计数器 def mostActivecustomers：退货分类客户对于客户，计入计数器customers.items 如果计数/客户数>=.05 开始=时间对于我在100范围内： _=MOST平均数据[i] 结束=时间 printf'Avg时间：{结束-开始*10}毫秒' 平均时间：16.061179637909936毫秒努比开始=时间对于我在100范围内：唯一元素，计数=np。uniquerandom_数据[i]，返回计数=True 活动=分类的独立元素[计数>0.05*随机数据[i]] 结束=时间 printf'Avg时间：{结束-开始*10}毫秒' 平均时间：3.5660386085510254毫秒

不出所料，由于底层的高性能C实现，numpy only解决方案的运行速度非常快。谈到性能，测试是关键。这是我机器上的运行时，其他答案中显示的一些代码的ofc，原始代码和基于numpy的答案所有代码都在相同的数据上运行：

不出所料，由于底层的高性能C实现，仅numpy的解决方案运行速度很快

这并没有提高性能，因为您仍然在每次计数。复杂性至少是二次的。这不会提高性能，因为您仍然在每次计数。复杂性至少是二次的。请尝试列表理解。这将使您的代码更简单：尝试列表理解。这将使您的代码更加简单：