使用Numpy和Pandas优化Python代码_Python_Python 3.x_Pandas_Numpy

使用Numpy和Pandas优化Python代码

python python-3.x pandas numpy

使用Numpy和Pandas优化Python代码,python,python-3.x,pandas,numpy,Python,Python 3.x,Pandas,Numpy,我的代码如下： import numpy as np import pandas as pd colum1 = [0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05] colum2 = [1,2,3,4,5,6,7,8,9,10,11,12] colum3 = [0.85,0.80,0.80,0.80,0.85,0.0,0.0,0.0,0.0,0.0,0.0,0.0] colum4 = [1743.85, 1485.58, 12

我的代码如下：

import numpy as np
import pandas as pd
colum1 = [0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05]
colum2 = [1,2,3,4,5,6,7,8,9,10,11,12]
colum3 = [0.85,0.80,0.80,0.80,0.85,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
colum4 = [1743.85, 1485.58, 1250.07, 1021.83, 818.96, 628.05, 455.40, 319.03, 190.86 , 97.07, 26.96 , 0.00]
df = pd.DataFrame({
    'colum1' : colum1,
    'colum2' : colum2,
    'colum3' : colum3,
    'colum4' : colum4,
});

df['result'] = 0
for i in range(len(colum2)):
    df['result'] = np.where(
        df['colum2'] <= 5,
        np.where(
            df['colum2'] == 1,
            df['colum4'],
            np.where(
                ( df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3'])) )>0,
                ( df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3'])) ),
                0
            )
        ),
        np.where(
            ( df['colum4'] - (df['result'].shift(1) * df['colum1']) )>0,
            ( df['colum4'] - (df['result'].shift(1) * df['colum1']) ),
            0
        )
    )

将numpy导入为np
作为pd进口熊猫
列1=[0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05]
列2=[1,2,3,4,5,6,7,8,9,10,11,12]
列3=[0.85,0.80,0.80,0.80,0.85,0.0,0.0,0.0,0.0,0.0,0.0]
列4=[1743.851485.581250.071021.83818.96628.05455.40319.03190.8697.07,26.96,0.00]
df=pd.DataFrame({
“colum1”：colum1，
“colum2”：colum2，
“colum3”：colum3，
“colum4”：colum4，
});
df['result']=0
对于范围内的i（len（colum2））：
df['result']=np.where(
df['colum2']0，
（df['column']-（df['result'].shift（1）*（df['column']]*df['column']），
0
)
),
np.在哪里(
（df['colum4']-（df['result'].shift（1）*df['colum1']）大于0，
（df['column4']-（df['result'].shift（1）*df['column1']），
0
)
)

我需要执行相同的操作，而不必求助于for循环。这将是非常有帮助的，因为我正在处理数以千计的记录，这是非常缓慢的

我的预期结果如下：

colum1 colum2 colum3 colum4 result 0 0.05 1 0.85 1743.85 1743.850000 1 0.05 2 0.80 1485.58 1415.826000 2 0.05 3 0.80 1250.07 1193.436960 3 0.05 4 0.80 1021.83 974.092522 4 0.05 5 0.85 818.96 777.561068 5 0.05 6 0.00 628.05 589.171947 6 0.05 7 0.00 455.40 425.941403 7 0.05 8 0.00 319.03 297.732930 8 0.05 9 0.00 190.86 175.973354 9 0.05 10 0.00 97.07 88.271332 10 0.05 11 0.00 26.96 22.546433 11 0.05 12 0.00 0.00 0.000000 第1列第2列第3列第4列结果 0 0.05 1 0.85 1743.85 1743.850000 1 0.05 2 0.80 1485.58 1415.826000 2 0.05 3 0.80 1250.07 1193.436960 3 0.05 4 0.80 1021.83 974.092522 4 0.05 5 0.85 818.96 777.561068 5 0.05 6 0.00 628.05 589.171947 6 0.05 7 0.00 455.40 425.941403 7 0.05 8 0.00 319.03 297.732930 8 0.05 9 0.00 190.86 175.973354 9 0.05 10 0.00 97.07 88.271332 10 0.05 11 0.00 26.96 22.546433 11 0.05 12 0.00 0.00 0.000000

第一步是删除索引上的循环，并用替换大于0的数字的测试。这是因为

np.where（a>0，a，0）

就我们而言等于

np.max（0，a）

同时，分别定义较长的表达式以使代码可读：

s1 = df['colum4'] - (df['result'].shift(1) * (df['colum1'] * df['colum3']))
s2 = df['colum4'] - (df['result'].shift(1) * df['colum1'])

df['result'] = np.where(df['colum2'] <= 5,
                        np.where(df['colum2'] == 1, df['colum4'],
                                 np.maximum(0, s1)),
                        np.maximum(0, s2))

此版本将更易于管理。

首先为什么要使用循环？如果您删除代码并将循环体带到缩进的外部级别，代码似乎可以工作。最大的问题在于.shift（1）：（您对预期结果的信心有多大？我只能复制前几行，然后再复制它们vary@user3483203抱歉，我刚刚纠正了Hanks，这是一个更易于管理的版本，但它并不能解决我消除buble for以获得所需结果的意图。上一个解决方案没有明确的

for

循环，这不是我的意思吗你想要吗？我只是编辑问题，添加所需的结果，但删除buble the resultrado is differents so（只是）一个代码编写服务。我很高兴澄清逻辑、概念、性能类型查询（如果有）。但是“获取这些数字”本身并不是我或其他用户感兴趣的事情。

m1 = df['colum2'] <= 5
m2 = df['colum2'] == 1

conds = [m1 & m2, m1 & ~m2]
choices = [df['colum4'], np.maximum(0, s1)]

df['result'] = np.select(conds, choices, np.maximum(0, s2))