Python 数据框中的matplotlib.lineCollection。当前iterrows解决方案性能低下_Python_Pandas_Performance_Dataframe_Matplotlib

Python 数据框中的matplotlib.lineCollection。当前iterrows解决方案性能低下

python pandas performance dataframe matplotlib

Python 数据框中的matplotlib.lineCollection。当前iterrows解决方案性能低下,python,pandas,performance,dataframe,matplotlib,Python,Pandas,Performance,Dataframe,Matplotlib,我有一个大数据框，其中包含一个值的坐标。我想在matplotlib中用不同的颜色为每个值绘制此图现在我有了一个可行的解决方案，可以将其打印为线集合。我正在使用ItErrors，因为这对我来说很容易理解，但速度非常慢我与另一个df合并，其中包含每个值的颜色。然后，如果颜色与前一段相同，我将循环并添加到当前段。如果没有，我开始一个新的部分 dff = df.merge( df_color, how='left', left_on='value',

我有一个大数据框，其中包含一个值的坐标。我想在matplotlib中用不同的颜色为每个值绘制此图

现在我有了一个可行的解决方案，可以将其打印为线集合。我正在使用ItErrors，因为这对我来说很容易理解，但速度非常慢

我与另一个df合并，其中包含每个值的颜色。然后，如果颜色与前一段相同，我将循环并添加到当前段。如果没有，我开始一个新的部分

dff = df.merge(
        df_color,
        how='left',
        left_on='value',
        right_on='value'
    )

segments = []
segments_i = -1
colorlist = []
previous_color = None

for _, row in dff.iterrows():
        point = (row['x'], row['y'])
        color = row['color']

        if np.any(np.isnan(color)):
            color = (0,0,0,0)

        if color == previous_color:
            segments[segments_i].append(point)
            previous_color = color
        else:
            # add endpoint to current segment
            if segments_i > 0:
                segments[segments_i].append(point)

            # start new segment
            segments.append([point])
            colorlist.append(color)

            previous_color = color
            segments_i += 1

colorlist = np.asarray(colorlist)

lc = mc.LineCollection(segments, colors=colorlist)

ax.add_collection(lc)

如何更改此选项以获得更好的性能？

我更改了此选项：

for _, row in dff.iterrows():
    point = (row['x'], row['y'])
    color = row['color']

为此：

dff['point'] = list(zip(dff['x'], dff['y']))

for point, color in zip(dff['point'], dff['ctable']):
    ...

这个小小的变化使它的运行速度提高了约15倍！这对我来说现在已经足够好了，但是如果你有更好的解决方案，请随时发布。

.iterrows（）

是出了名的慢，而且显式迭代无论如何都不是理想的解决方案。我们需要您的整个程序，以及足够的数据来对其进行基准测试。请看：。是的，我读到iterrows（）很慢，但我认为我需要使用apply（）或类似的东西。我用“for x，y in zip（df.x，df.y）”修改了iterrows，速度更快。仍然使用迭代，但现在对我来说已经足够快了。你能提供足够的代码和数据来重现程序吗？请参阅：。