Python 从两个for循环填充pandas数据帧的最快方法是什么？_Python_Pandas

Python 从两个for循环填充pandas数据帧的最快方法是什么？

python pandas

Python 从两个for循环填充pandas数据帧的最快方法是什么？,python,pandas,Python,Pandas,我已经有了一个数据帧，我需要在每个索引上对所有前面的索引进行计算（因此对于187个索引，有17766次计算）。这需要高效，以便扩展到数百万次计算 #this is the original dataframe df = pd.DataFrame(np.random.rand(187,2)) #this is the dataframe to write to df2 = pd.DataFrame() #blank list to write to ind_diff = [] 方法1：清单

我已经有了一个数据帧，我需要在每个索引上对所有前面的索引进行计算（因此对于187个索引，有17766次计算）。这需要高效，以便扩展到数百万次计算

#this is the original dataframe
df = pd.DataFrame(np.random.rand(187,2))
#this is the dataframe to write to
df2 = pd.DataFrame()
#blank list to write to
ind_diff = []

方法1：清单

for n in range(0, len(df)):     

    for i in range(n + 1, len(df)):

        ind_diff.append(df.index[i] - df.index[n])

方法2：数据帧追加

for n in range(0, len(df)):     

    for i in range(n + 1, len(df)):

        df2 = df2.append(df.Index[i] - df.Index[n])

方法#1仅将最终计算作为输出返回，即长度为1的列表。为什么呢？

方法#2有效，但速度太慢。我知道这不是创建数据帧的推荐方法（根据文档，并且

pd.concat

更有效），但我正在寻找最快的方法。提前感谢

让我们试试广播数组算法：

v = df.values
v = v - v[:, None]
i, j = np.triu_indices(df.shape[0])

df2 = pd.DataFrame(v[i, j])

这是非常快的，但是对于太多的记录（~数百万条）很快就会失控，因为这会导致内存爆炸，并且一半的计算是冗余的（因为对称性）.

请发布一个输入和预期输出的示例。您是在计算差异，还是在简化问题？@wwii:计算是：如演示的索引差异，datetime对象产生时间增量，简单的算术如果您能修改代码，使其不会引起

AttributeError

hasattr（df，'Index'）-->False

我以前从未听说过，谢谢您的快速响应。我会测试一下。你说的记录是指原始的df，对吗？这不应该超过40k记录，何况不仅仅是

df3=pd.DataFrame（v[i，j]）

？@wwii谢谢。我显然需要多睡一会儿；没有发现这一点。“如果你的问题被回答了，请考虑把最有用的答案标记为可以接受的（你可以通过点击左边的灰色检查来切换它的绿色）。谢谢@tripkane v[i]给了你整行。