Python 使用openpyxl对大量数据进行大量计算_Python_Openpyxl

Python 使用openpyxl对大量数据进行大量计算

python

Python 使用openpyxl对大量数据进行大量计算,python,openpyxl,Python,Openpyxl,我对使用openpyxl收集有关我拥有的大型数据集的关键指标感兴趣。我感兴趣的两件事是基数和字段重要性（即这个字段有多少“null”或“junk”值）。我遇到了性能问题，想知道是否有任何方法可以优化我的代码。我最大的excel文件大约有20000行。我知道openpyxl的优化阅读器，但我需要查看每个单元格并获取其值我的脚本从一个大的xlsx文件中读取数据，然后写入一个包含每个字段信息的google文档 def run(table, limit_percent_null): exce

我对使用openpyxl收集有关我拥有的大型数据集的关键指标感兴趣。我感兴趣的两件事是基数和字段重要性（即这个字段有多少“null”或“junk”值）。我遇到了性能问题，想知道是否有任何方法可以优化我的代码。我最大的excel文件大约有20000行。我知道openpyxl的优化阅读器，但我需要查看每个单元格并获取其值

我的脚本从一个大的xlsx文件中读取数据，然后写入一个包含每个字段信息的google文档

def run(table, limit_percent_null):

    excel_workbook = load_workbook(filename = settings.mypath + table + '.xlsx', read_only=True)
    excel_sheet = excel_workbook.worksheets[0]

    d = dict()
    # first loop through our fields

    for i in range(1, excel_sheet.get_highest_column()):
        key = excel_sheet.cell(row = 1, column = i).value
        if key is None:
            break;

        # key is the field and value is list of booleans 
        # true = null or empty, false = has an actual value
        d[key] = []

        # low loop through actual values of those fields
        for j in range(2, excel_sheet.get_highest_row()):
            field = excel_sheet.cell(row = j, column = i).value

            # does the field have "null" in it or is empty?
            if field is None:
                d[key].append(True)
            else:
                d[key].append(True if "null" in str(field) else False)

    # write to google doc
    google_sheet = settings.open_gspread_connetion(table)
    for key, value in d.items():
        # omitted

您遇到了哪些性能问题？两万排也没那么多。顺便说一句，在浏览数据方面，试图击败

ws.iter\u rows（）

没有什么意义

ws.columns

应该是可用的，但我认为您的代码应该被重写，只用于处理行，以避免嵌套循环。