Python 有没有办法加快这个功能？_Python_Pandas

Python 有没有办法加快这个功能？

python pandas

Python 有没有办法加快这个功能？,python,pandas,Python,Pandas,我知道矢量化函数是编写代码以提高速度的首选方法，但我想不出一种没有循环的方法来实现这个函数。我编写这个函数的方式导致完成时间非常慢。（传递两个包含100列和2000行的数据帧作为参数，此函数需要100秒才能完成。我希望能在1秒内完成更多。）函数的输入只是两个数据帧，其值为1、-1或0。如果有人对如何矢量化或加速它有想法，我将不胜感激。我试着用“[sym][inum]”替换“.ix[inum，sym]”，但速度更慢 GOOG longp GOOG shortp GOOG f

我知道矢量化函数是编写代码以提高速度的首选方法，但我想不出一种没有循环的方法来实现这个函数。我编写这个函数的方式导致完成时间非常慢。（传递两个包含100列和2000行的数据帧作为参数，此函数需要100秒才能完成。我希望能在1秒内完成更多。）

函数的输入只是两个数据帧，其值为1、-1或0。如果有人对如何矢量化或加速它有想法，我将不胜感激。我试着用“[sym][inum]”替换“.ix[inum，sym]”，但速度更慢

           GOOG longp GOOG shortp GOOG func result
2011-07-28          0          -1               -1
2011-07-29          0          -1               -1
2011-08-01          0          -1               -1
2011-08-02          0          -1               -1
2011-08-03          0          -1               -1
2011-08-04          0          -1               -1
2011-08-05          0          -1               -1
2011-08-08          0           0             -0.5
2011-08-09          0           0             -0.5
2011-08-10          0           0             -0.5
2011-08-11          0           0             -0.5
2011-08-12          1           0                1
2011-08-15          1           0                1
2011-08-16          1           0                1
2011-08-17          1           0                1
2011-08-18          1           0                1
2011-08-19          1           0                1
2011-08-22          1           0                1
2011-08-23          1           0                1
2011-08-24          1           0                1
2011-08-25          1           0                1
2011-08-26          1           0                1
2011-08-29          1           0                1
2011-08-30          1           0                1
2011-08-31          1           0                1
2011-09-01          1           0                1
2011-09-02          1           0                1
2011-09-06          1           0                1
2011-09-07          1           0                1
2011-09-08          1           0                1
2011-09-09          1           0                1
2011-09-12          1           0                1
2011-09-13          1           0                1
2011-09-14          1           0                1
2011-09-15          1           0                1
2011-09-16          1           0                1
2011-09-19          1           0                1
2011-09-20          1           0                1
2011-09-21          1           0                1
2011-09-22          1           0                1
2011-09-23          1           0                1
2011-09-26          1           0                1
2011-09-27          1           0                1
2011-09-28          1           0                1
2011-09-29          0           0              0.5
2011-09-30          0          -1               -1
2011-10-03          0          -1               -1
2011-10-04          0          -1               -1
2011-10-05          0          -1               -1
2011-10-06          0          -1               -1
2011-10-07          0          -1               -1
2011-10-10          0          -1               -1
2011-10-11          0          -1               -1
2011-10-12          0          -1               -1
2011-10-13          0          -1               -1
2011-10-14          0          -1               -1
2011-10-17          0          -1               -1
2011-10-18          0          -1               -1
2011-10-19          0          -1               -1
2011-10-20          0          -1               -1


           IBM longp IBM shortp IBM func result
2012-05-01         1         -1               1
2012-05-02         1         -1               1
2012-05-03         1         -1               1
2012-05-04         1         -1               1
2012-05-07         1         -1               1
2012-05-08         1          0               1
2012-05-09         1          0               1
2012-05-10         1          0               1
2012-05-11         1          0               1
2012-05-14         1          0               1
2012-05-15         1          0               1
2012-05-16         0         -1              -1
2012-05-17         0         -1              -1
2012-05-18         0         -1              -1
2012-05-21         0         -1              -1
2012-05-22         0         -1              -1
2012-05-23         0         -1              -1
2012-05-24         0         -1              -1
2012-05-25         0         -1              -1
2012-05-29         0         -1              -1
2012-05-30         0         -1              -1
2012-05-31         0         -1              -1
2012-06-01         0         -1              -1
2012-06-04         0         -1              -1
2012-06-05         0         -1              -1
2012-06-06         0         -1              -1
2012-06-07         0         -1              -1
2012-06-08         1         -1               1
2012-06-11         1         -1               1
2012-06-12         1         -1               1
2012-06-13         1         -1               1
2012-06-14         1         -1               1
2012-06-15         1         -1               1
2012-06-18         1         -1               1
2012-06-19         1         -1               1
2012-06-20         1         -1               1
2012-06-21         1          0               1
2012-06-22         1          0               1
2012-06-25         1          0               1
2012-06-26         1          0               1
2012-06-27         1          0               1
2012-06-28         1          0               1
2012-06-29         1          0               1

编辑：

我只是重新运行了一些旧代码，它使用类似的循环通过一个数据帧来设置值。过去可能需要5秒钟，现在我看到可能需要100倍的时间。我想知道这个问题是否是因为最近版本的《熊猫》有所改变。这是我能想到的唯一改变的变量。请参见下面的代码。使用Pandas 0.11在我的计算机上运行需要73秒。对于一个非常基本的函数来说，这似乎非常缓慢，尽管它是一个按元素运行的函数，但仍然是。如果有人有机会的话，我很好奇下面的内容在你的电脑和你的熊猫版本上需要多长时间

import time
import numpy as np
import pandas as pd
def timef(func, *args):
    start= time.clock()
    for i in range(2):
        func(*args)
    end= time.clock()
    time_complete = (end-start)/float(2)
    print time_complete

def tfunc(num_row, num_col):
    df = pd.DataFrame(index = np.arange(1,num_row), columns = np.arange(1,num_col))
    for col in df.columns:
        for inum in range(1, len(df.index)):
            df.ix[inum, col] = 0 #np.nan
    return df

timef(tfunc, 1000, 1000)  <<< This takes 73 seconds on a Core i5 M460 2.53gz Windows 7 laptop.

我相信.ix实现在0.11中确实发生了变化。（）不确定是否相关

我在0.10.1上得到的一个快速加速是，我将tfunc改为below以缓存正在更新的列/序列

def tfunc(num_row, num_col):
   df = pd.DataFrame(index = np.arange(1,num_row), columns = np.arange(1,num_col))
   for col in df.columns:
       sdf = df[col]
       for inum in range(1, len(df.index)):
           sdf.ix[inum] = 0 #np.nan
   return df

在我的机器上从~80到~9

你能解释一下函数的目标，而不是让读者通过阅读代码来猜出它吗？这是一个财务数据问题。Longp是由1或0组成的数据帧。1表示购买或持有证券。0表示出售或保留现金。Shortp由-1或0组成-1是卖空或保持卖空。0将变为现金或保留为现金。此函数用于将多头和空头仓位序列组合成一个信号，其中1表示买入或持有，0.5表示退出买入仓位或保留现金，-1表示空头或保留空头，-0.5表示退出空头或保留现金。我添加了一些示例数据和期望的结果。请让我知道是否需要额外的澄清。示例对于站点来说太复杂了，请尝试简化它（然后您可以扩展我们在回答中给出的想法）。不要使用for循环，这看起来像是可以矢量化的…简化函数或示例数据？此函数是否正在执行

pd.DataFrame（0，index=np.arange（1，num\u row），columns=np.arange（1，num\u col），dtype='float'）

除了最后一行NaN之外，我相信这是一个模拟原始循环的测试函数

def gen_fuzz_logic_signal3(longp, shortp):
    # Input dataframes should have 0 or 1 value
    flogic_signal = pd.DataFrame(index = longp.index, columns = longp.columns)
    for sym in longp.columns:
        coll = longp[sym].values
        cols = shortp[sym].values
        prev_enter = 0
        newcol = [None] * len(coll)
        for inum in range(1, len(coll)):
            cur_val = np.nan
            if coll[inum] == 0  and prev_enter == +1:
                cur_val = 0.5
            if cols[inum] == 0 and prev_enter == -1:
                cur_val = -0.5
            if coll[inum] == 1 and cols[inum] == -1:
                if coll[inum -1] != 1:
                    cur_val = 1
                    prev_enter = 1
                elif cols[inum-1] != -1:
                    cur_val = -1
                    prev_enter = -1
                else:
                    cur_val = prev_enter
            else:
                if coll[inum] == 1:
                    cur_val = 1
                    prev_enter = 1
                if cols[inum] == -1:
                    cur_val = -1
                    prev_enter = -1
            newcol[inum] = cur_val
        flogic_signal[sym] = newcol
    return flogic_signal

def tfunc(num_row, num_col):
   df = pd.DataFrame(index = np.arange(1,num_row), columns = np.arange(1,num_col))
   for col in df.columns:
       sdf = df[col]
       for inum in range(1, len(df.index)):
           sdf.ix[inum] = 0 #np.nan
   return df