Python 替换列表中字符的最快方法_Python_Algorithm_List

Python 替换列表中字符的最快方法

python algorithm list

Python 替换列表中字符的最快方法,python,algorithm,list,Python,Algorithm,List,我感兴趣的是找到迭代列表并替换最里面列表中的字符的最快方法。我正在从Python中的CSV文件生成列表列表 Bing Ads API向我发送了一份巨大的报告，但任何百分比都表示为“20.00%”，而不是“20.00%”。这意味着我不能按原样将每一行插入数据库，因为“20.00%”在SQL Server上不能转换为数字到目前为止，我的解决方案是在列表理解中使用列表理解。我编写了一个小脚本来测试它的运行速度，与只获取列表相比，它运行得很好（大约是运行时的2倍），但我很好奇是否有更快的方法注意：报

我感兴趣的是找到迭代列表并替换最里面列表中的字符的最快方法。我正在从Python中的CSV文件生成列表列表

Bing Ads API向我发送了一份巨大的报告，但任何百分比都表示为“20.00%”，而不是“20.00%”。这意味着我不能按原样将每一行插入数据库，因为“20.00%”在SQL Server上不能转换为数字

到目前为止，我的解决方案是在列表理解中使用列表理解。我编写了一个小脚本来测试它的运行速度，与只获取列表相比，它运行得很好（大约是运行时的2倍），但我很好奇是否有更快的方法

注意：报告中的每个记录都有一个比率，因此有一个百分比。所以每记录必须访问一次，每个速率必须访问一次（这是导致2倍减速的原因吗？）

无论如何，随着这些报告的规模不断增长，我希望有一个更快的解决方案

import time
import csv

def getRecords1():
   with open('report.csv', 'rU',encoding='utf-8-sig') as records:
       reader = csv.reader(records)
       while next(reader)[0]!='GregorianDate': #Skip all lines in header (the last row in header is column headers so the row containing 'GregorianDate' is the last to skip)
           next(reader)
       recordList = list(reader)
   return recordList

def getRecords2():
   with open('report.csv', 'rU',encoding='utf-8-sig') as records:
       reader = csv.reader(records)
       while next(reader)[0]!='GregorianDate': #Skip all lines in header (the last row in header is column headers so the row containing 'GregorianDate' is the last to skip)
           next(reader)
       recordList = list(reader)
   data = [[field.replace('%', '') for field in record] for record in recordList]
   return recordList

def getRecords3():
    data = []
    with open('c:\\Users\\sflynn\\Documents\\Google API Project\\Bing\\uploadBing\\reports\\report.csv', 'rU',encoding='utf-8-sig') as records:
        reader = csv.reader(records)
        while next(reader)[0]!='GregorianDate': #Skip all lines in header (the last row in header is column headers so the row containing 'GregorianDate' is the last to skip)
            next(reader)
        for row in reader:
            row[10] = row[10].replace('%','') 
            data+=[row]
    return data
        
def main():
    t0=time.time()
    for i in range(2000):
        getRecords1()
    t1=time.time()
    print("Get records normally takes " +str(t1-t0))

    t0=time.time()
    for i in range(2000):
        getRecords2()
    t1=time.time()
    print("Using nested list comprehension takes " +str(t1-t0))

    t0=time.time()
    for i in range(2000):
        getRecords3()
    t1=time.time()
    print("Modifying row as it's read takes " +str(t1-t0))



main()

编辑：我添加了第三个函数getRecords3（），这是我见过的最快的实现。运行该程序的输出如下：

获取记录通常需要30.61197066307068

使用嵌套列表理解需要60.81756520271301

按读取方式修改行需要43.761850357055664

这意味着我们已经将它从一个2倍慢的算法降低到大约1.5倍慢的算法。谢谢大家

您可以检查就地内部列表修改是否比使用列表理解创建新列表更快

比如

对于记录中的字段：
对于范围内的索引（len（字段））：
范围[索引]=范围[索引]。替换（'%'，''）

因为字符串是不可变的，所以我们不能真正修改字符串。

所以数据是“矩形”的吗？（每行有相同数量的单元格）？似乎您可以在读取该行时删除“%”符号，而不是将该行放入记录列表中，然后必须再次遍历该列表才能删除“%”符号。似乎您正在将每个“%”替换为“”？如果你使用像sed这样的其他工具会更快吗？@WillemVanOnsem是的，数据是矩形的，谢谢你的帮助clarifying@JimMischel我现在要试一试，看看结果如何。