Python 在这种情况下，有没有比iterrows（）更有效的工具？_Python_Performance_Pandas_Dataframe

Python 在这种情况下，有没有比iterrows（）更有效的工具？

python performance pandas dataframe

Python 在这种情况下，有没有比iterrows（）更有效的工具？,python,performance,pandas,dataframe,Python,Performance,Pandas,Dataframe,好吧，事情是这样的。我正在处理大量的熊猫数据帧和数组。通常情况下，我需要将一帧中的值与另一帧中的值配对，理想情况下，最终将信息组合到一帧中假设我正在查看图像文件。每个文件都有一组特定的信息。有时，某些类型的图像文件共享相同类型的信息。简单的例子： FILEPATH, TYPE, COLOR, VALUE_I,<br> /img2.jpg, A, 'green', 0.6294<br> /img45.jpg, B, 'green',

好吧，事情是这样的。我正在处理大量的熊猫数据帧和数组。通常情况下，我需要将一帧中的值与另一帧中的值配对，理想情况下，最终将信息组合到一帧中

假设我正在查看图像文件。每个文件都有一组特定的信息。有时，某些类型的图像文件共享相同类型的信息。简单的例子：

FILEPATH,    TYPE,   COLOR,   VALUE_I,<br>
/img2.jpg,    A,    'green',   0.6294<br>
/img45.jpg,   B,    'green',   0.1846<br>
/img87.jpg,   A,    'blue',    34.78<br>

解决方案将不胜感激

关注问题的后半部分，因为这就是您提供代码的目的。您的程序正在对照df2中的每一行检查df1的每一行，可能产生1800*1800或3240000个可能的组合。如果每行只有一个可能的匹配项，那么添加“break”将有一些帮助，但并不理想。 newColumn.append[rw['VALUE_I']，rw['VALUE_II']，rw['VALUE_III'] 中断

如果您的数据结构允许，我会尝试以下方法：

ref = {}
for i, path in enumerate(otherDataframe['filepath']):
   *_, file = path.split('\\')
   ref[file] = i

originalDataframe['VALUE_I'] = None
originalDataframe['VALUE_II'] = None
originalDataframe['VALUE_III'] = None

for i, file in enumerate(originalDataframe['filename']):
    try:
        j = ref[file]
        originalDataframe.loc[i, 'VALUE_I'] = otherDataframe.loc[j, 'VALUE_I']
        originalDataframe.loc[i, 'VALUE_II'] = otherDataframe.loc[j, 'VALUE_II']
        originalDataframe.loc[i, 'VALUE_III'] = otherDataframe.loc[j, 'VALUE_III']
    except:
        pass

在这里，我们遍历otherDataframe中的路径，我假设它们遵循C:\asdf\asdf\file的模式，在\上拆分路径以拉出文件，然后根据行号构建文件字典。接下来，我们初始化originalDataframe中要写入的3列

最后，我们遍历originalDataframe中的文件，检查otherDataframe中的文件字典中是否存在该文件，并尝试捕获错误，然后从字典中提取行号，然后使用该行号将值从other写入原始

旁注，您将路径描述为“C:/asd/fdg/img2.jpg”，在这种情况下，您应该使用：

*_, file = path.split('/')

如果您可以从otherDataframe[filepath]中拆分文件名，则无需签入即可与orinalDataframe的文件名进行相等性比较。之后，您可以使用简化计算，对于originalDataframe中的每个文件名，将在otherDataframe中找到相同的文件名，并从中添加所有其他列

import os

otherDataframe["filename"] = otherDataframe["filepath"].map(os.path.basename)
joinedDataframe = originalDataframe.join(otherDataframe.set_index("filename"), on="filename")

如果originalDataframe和otherDataframe中有同名的列，则应设置lsuffix或rsuffix

import os

otherDataframe["filename"] = otherDataframe["filepath"].map(os.path.basename)
joinedDataframe = originalDataframe.join(otherDataframe.set_index("filename"), on="filename")