Python 用于创建两个矩阵的pandas代码/循环优化_Python_Performance_Optimization_Pandas_Cython

Python 用于创建两个矩阵的pandas代码/循环优化

python performance optimization pandas

Python 用于创建两个矩阵的pandas代码/循环优化,python,performance,optimization,pandas,cython,Python,Performance,Optimization,Pandas,Cython,我需要优化这个循环，它需要2.5秒。需要的是，我在脚本中调用它3000多次。这段代码的目的是创建两个矩阵，它们在线性系统中使用有人知道Python或Cython吗 ## df is only here for illustration and date_indicatrice changes upon function call df = pd.DataFrame(0, columns=range(6), index=pd.date

我需要优化这个循环，它需要2.5秒。需要的是，我在脚本中调用它3000多次。这段代码的目的是创建两个矩阵，它们在线性系统中使用

有人知道Python或Cython吗

 ## df is only here for illustration and date_indicatrice changes upon function call
 df     = pd.DataFrame(0, columns=range(6), 
                       index=pd.date_range(start = pd.datetime(2010,1,1),
                       end = pd.datetime(2020,1,1), freq="H"))
 mat    = pd.DataFrame(0,index=df.index,columns=range(6))
 mat_bp = pd.DataFrame(0,index=df.index,columns=range(6*2))

 date_indicatrice = [(pd.datetime(2010,1,1), pd.datetime(2010,4,1)),
                     (pd.datetime(2012,5,1), pd.datetime(2019,4,1)),
                     (pd.datetime(2013,4,1), pd.datetime(2019,4,1)),
                     (pd.datetime(2014,3,1), pd.datetime(2019,4,1)),
                     (pd.datetime(2015,1,1), pd.datetime(2015,4,1)),
                     (pd.datetime(2013,6,1), pd.datetime(2018,4,1))]

timer = time.time()

for j, (d1,d2) in enumerate(date_indicatrice):
    result      = df[(mat.index>=d1)&(mat.index<=d2)]
    result2     = df[(mat.index>=d1)&(mat.index<=d2)&(mat.index.hour>=8)]
    mat.loc[result.index,j]       = 1.
    mat_bp.loc[result2.index,j*2] = 1.
    mat_bp[j*2+1] = (1 - mat_bp[j*2]) * mat[j]

print time.time()-timer

###df仅用于说明函数调用时日期指示的变化
df=pd.DataFrame（0，列=范围（6），
索引=pd.date_范围（开始=pd.datetime（2010,1,1），
end=pd.datetime（2020,1,1），freq=“H”））
mat=pd.DataFrame（0，index=df.index，columns=range（6））
mat_bp=pd.DataFrame（0，index=df.index，columns=range（6*2））
日期指示=[（pd.datetime（2010,1,1），pd.datetime（2010,4,1）），
（pd.datetime（2012,5,1），pd.datetime（2019,4,1）），
（pd.datetime（2013,4,1），pd.datetime（2019,4,1）），
（pd.日期时间（2014,3,1），pd.日期时间（2019,4,1）），
（pd.datetime（2015,1,1），pd.datetime（2015,4,1）），
（pd.日期时间（2013,6,1），pd.日期时间（2018,4,1））]
timer=time.time（）
对于枚举（日期指示）中的j，（d1，d2）：
结果=df[（材料指数>=d1）和（材料指数=d1）和（材料指数=8）]
材料位置[结果索引，j]=1。
材料位置[result2.索引，j*2]=1。
mat_-bp[j*2+1]=（1-mat_-bp[j*2]）*mat[j]
打印时间。时间（）-计时器

给你。我测试了以下内容，在mat和mat_bp中得到的结果矩阵与原始代码中的结果矩阵相同，但在我的机器上，原始代码的结果矩阵是0.07秒，而不是1.4秒

真正的减速是由于使用了result.index和result2.index。使用日期时间查找要比使用索引查找慢得多。在可能的情况下，我使用了二进制搜索来找到正确的索引

import pandas as pd
import numpy as np
import time
import bisect
## df is only here for illustration and date_indicatrice changes upon function call
df     = pd.DataFrame(0, columns=range(6),
                      index=pd.date_range(start = pd.datetime(2010,1,1),
                      end = pd.datetime(2020,1,1), freq="H"))
mat    = pd.DataFrame(0,index=df.index,columns=range(6))
mat_bp = pd.DataFrame(0,index=df.index,columns=range(6*2))

date_indicatrice = [(pd.datetime(2010,1,1), pd.datetime(2010,4,1)),
                    (pd.datetime(2012,5,1), pd.datetime(2019,4,1)),
                    (pd.datetime(2013,4,1), pd.datetime(2019,4,1)),
                    (pd.datetime(2014,3,1), pd.datetime(2019,4,1)),
                    (pd.datetime(2015,1,1), pd.datetime(2015,4,1)),
                    (pd.datetime(2013,6,1), pd.datetime(2018,4,1))]

timer = time.time()

for j, (d1,d2) in enumerate(date_indicatrice):
    ind_start = bisect.bisect_left(mat.index, d1)
    ind_end = bisect.bisect_right(mat.index, d2)
    inds = np.array(xrange(ind_start, ind_end))
    valid_inds = inds[mat.index[ind_start:ind_end].hour >= 8]
    mat.loc[ind_start:ind_end,j]       = 1.
    mat_bp.loc[valid_inds,j*2] = 1.
    mat_bp[j*2+1] = (1 - mat_bp[j*2]) * mat[j]

print time.time()-timer

非常感谢。有没有办法不使用有效索引？如果你想用小时作为选择标准，我不这么认为。