Python 在一定次数的迭代后打开一个新的CSV文件_Python_Pandas_Csv

Python 在一定次数的迭代后打开一个新的CSV文件

python pandas csv

Python 在一定次数的迭代后打开一个新的CSV文件,python,pandas,csv,Python,Pandas,Csv,此代码计算csv文件中两列价格数据的比率，并将该比率写入另一列。经过几百次计算后，虽然这段代码变慢了。在计算了给定数量的比率后，如何打开新的CSV文件来存储比率 sector_name = ['asset_management', 'basic_materials', 'conglomerates', 'consumer_goods', 'financials', 'healthcare', 'industrial_goods', 'services', 'technology', 'utili

此代码计算csv文件中两列价格数据的比率，并将该比率写入另一列。经过几百次计算后，虽然这段代码变慢了。在计算了给定数量的比率后，如何打开新的CSV文件来存储比率

sector_name = ['asset_management', 'basic_materials', 'conglomerates', 'consumer_goods', 'financials', 'healthcare', 'industrial_goods', 'services', 'technology', 'utilities']

def data_sector_ratios():

    for sector,name in zip(list_all_sectors, sector_name):

        for ticker in sector:

            df = pd.read_csv(.../price_data_file.csv)
            df.drop(df.columns[df.columns.str.contains('unnamed',case = False )],axis = 1, inplace = True)
            fieldnames = ["PAIR", "RATIO"]

            with open(.../sector_ratios.csv, 'w') as file:                
                writer = csv.DictWriter(file, fieldnames=fieldnames, lineterminator = '\n')
                writer.writeheader()
                cols = list(df.columns[1:])
                for i,c in enumerate(cols[:-1]):
                    for c2 in cols[i+1:]:
                        df['{}/{}'.format(c,c2)] = df[c]/df[c2]
                        dff = df['{}/{}'.format(c,c2)]
                        dff.dropna(inplace=True)
                        length = len(dff.index)                        
                        start = dff.iloc[0]
                        end = dff.iloc[length-1]
                        change = str((end - start)/start)
                        pair = df.columns[-1]                        
                        row = {"PAIR": pair, "RETURNS": change}
                        writer.writerow(row)
                        print("{}/{} RATIO CALCULATED".format(c,c2))

一些扇区有大约700个列。因此（700^2-700）/2=~490'000比率。大约两万后，创建一个新文件，例如：基本材料比率2或其他。价格数据csv文件如下所示：

编辑：

输出CSV文件。我只想在for循环每次计算比率时继续添加行

PAIR        RATIO
A/AA       xxxxxx
A/AABA     xxxxxx
A/AAL      xxxxxx
.....      ......

为什么要使用panda的read_csv（）函数而不是相应的df.write_csv（）

如果您加载数据帧，根据需要转换数据，并在最后执行write_csv（），则会简单得多，速度也可能快得多

如果要创建多个文件，只需为所需的行对数据帧进行切片。

我建议在数据准备好导出之前，保留在pandas中。

cols = df.columns[1:] # assuming your first column is your index, move it there # no need to enumerate here for col_1 in cols: for col_2 in cols: # skip unnecessary computations if col_1 == col_2: continue df[f'{col_1}/{col_2}'] = (df[col_1]/df[col_2]).replace(abs(np.inf), np.nan)
有很多方法可以优化pandas中的代码，使其超快速运行
首先，这里是您要执行的操作的一个稍微整洁的版本。

cols = df.columns[1:] # assuming your first column is your index, move it there # no need to enumerate here for col_1 in cols: for col_2 in cols: # skip unnecessary computations if col_1 == col_2: continue df[f'{col_1}/{col_2}'] = (df[col_1]/df[col_2]).replace(abs(np.inf), np.nan)
假设您的数据加载在一个巨型df中，您希望确定执行过程中出现的差距。

cols = df.columns[1:] # assuming your first column is your index, move it there # no need to enumerate here for col_1 in cols: for col_2 in cols: # skip unnecessary computations if col_1 == col_2: continue df[f'{col_1}/{col_2}'] = (df[col_1]/df[col_2]).replace(abs(np.inf), np.nan)

我的假设是，您的数据有很多
null
值或零，在这种情况下，大量的帧分割将变得非常慢。可以通过在lambda或list操作符中包装除法来避免这种情况：
df.B.div（df.a.where（df.a！=0，np.nan））

也许你的数据帧太大了，以至于你的机器的内存都被淹没了。在这里，我建议您按块操作

也许您的数据具有混合类型，并且由于每次计算都进行转换而变得缓慢。去擦洗

总而言之，对于这种缓慢的计算，您所要求/建议的解决方案具有惊人的创新性，但坦率地说，这将是一个巨大的时间浪费。在一段时间内进行优化。
当我写入csv时，我不确定如何使用不同的值写入新行。我知道如何添加列，但是如何在每一行中放置不同的值？除了下面的答案之外，我强烈建议您多检查一下熊猫文档。这取决于你想做什么。如果您只是添加数据：如果您需要更改现有数据框中的数据，您可以执行类似df.apply（）的操作。谢谢，我将查看文档。我做了编辑。我只想在for循环每次计算一行时添加一行比率。嘿，谢谢你花时间！我尝试了你的代码，它工作得很好，但当我在最后添加df.to_csv时，它会使速度减慢很多。我遗漏了什么？如果你的df很大，写csv需要一点时间。如果您将数据用于外部用途，则无需做太多工作。如果内部使用，考虑出口到泡菜。