Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/313.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python';对于';循环性能太慢_Python_Loops - Fatal编程技术网

Python';对于';循环性能太慢

Python';对于';循环性能太慢,python,loops,Python,Loops,我的数据帧中有超过500000行和许多类似的“for”循环,这导致我的代码需要一个多小时才能完成计算。是否有一种更有效的方法来编写以下“for”循环,以便运行得更快: col_26 = [] col_27 = [] col_28 = [] for ind in df.index: if df['A_factor'][ind] > df['B_factor'][ind]: col_26.append('Yes') col_27.append('No

我的数据帧中有超过500000行和许多类似的“for”循环,这导致我的代码需要一个多小时才能完成计算。是否有一种更有效的方法来编写以下“for”循环,以便运行得更快:

col_26 = []
col_27 = []
col_28 = []


for ind in df.index:
    if df['A_factor'][ind] > df['B_factor'][ind]:
        col_26.append('Yes')
        col_27.append('No')
        col_28.append(df['A_value'][ind])
    elif df['A_factor'][ind] < df['B_factor'][ind]:
        col_26.append('No')
        col_27.append('Yes')
        col_28.append(df['B_value'][ind])
    else:
        col_26.append('')
        col_27.append('')
        col_28.append(float('nan'))
col_26=[]
col_27=[]
col_28=[]
对于df.index中的ind:
如果df['A_因子'][ind]>df['B_因子'][ind]:
第26列追加('是')
col_27.追加('否')
col_28.追加(df['A_value'][ind])
elif df['A_因子'][ind]
您可能想查看pandas iterrows()函数或使用apply,您也可以查看这篇文章:

您可能想查看pandas iterrows()函数或使用apply,您也可以查看这篇文章:

在Python中添加列表速度非常慢。在迭代之前初始化列表可以加快速度。比如说,

def f():
    x = []
    for ii in range(500000):
        x.append(str(x))

def f2():
    x = [""] * 500000
    for ii in range(500000):
        x[ii] = str(x)


timeit.timeit("f()", "from __main__ import f", number=10)
# Output: 1.6317970999989484
timeit.timeit("f2()", "from __main__ import f2", number=10)
# Output: 1.3037318000024243
因为您已经在使用pandas/numpy,所以有一些方法可以矢量化您的操作,这样它们就不需要循环。例如:

a_factor = df["A_factor"].to_numpy()
b_factor = df["B_factor"].to_numpy()

col_26 = np.empty(a_factor.shape, dtype='U3') # U3 => string of size 3
col_27 = np.empty(a_factor.shape, dtype='U3')
col_28 = np.empty(a_factor.shape)

a_greater = a_factor > b_factor
b_greater = a_factor < b_factor
both_equal = a_factor == b_factor

col_26[a_greater] = 'Yes'
col_26[b_greater] = 'No'

col_27[a_greater] = 'Yes'
col_27[b_greater] = 'No'

col_28[a_greater] = a_factor[a_greater]
col_28[b_greater] = b_factor[b_greater]
col_28[both_equal] = np.nan
a_factor=df[“a_factor”]。to_numpy()
b_因子=df[“b_因子”]。to_numpy()
col#u26=np.empty(a_factor.shape,dtype='U3')#U3=>大小为3的字符串
col_27=np.empty(a_factor.shape,dtype='U3')
col_28=np.空(a_因数形状)
a_大于=a_系数>b_系数
b_更大=a_系数
在Python中添加列表的速度非常慢。在迭代之前初始化列表可以加快速度。比如说,

def f():
    x = []
    for ii in range(500000):
        x.append(str(x))

def f2():
    x = [""] * 500000
    for ii in range(500000):
        x[ii] = str(x)


timeit.timeit("f()", "from __main__ import f", number=10)
# Output: 1.6317970999989484
timeit.timeit("f2()", "from __main__ import f2", number=10)
# Output: 1.3037318000024243
因为您已经在使用pandas/numpy,所以有一些方法可以矢量化您的操作,这样它们就不需要循环。例如:

a_factor = df["A_factor"].to_numpy()
b_factor = df["B_factor"].to_numpy()

col_26 = np.empty(a_factor.shape, dtype='U3') # U3 => string of size 3
col_27 = np.empty(a_factor.shape, dtype='U3')
col_28 = np.empty(a_factor.shape)

a_greater = a_factor > b_factor
b_greater = a_factor < b_factor
both_equal = a_factor == b_factor

col_26[a_greater] = 'Yes'
col_26[b_greater] = 'No'

col_27[a_greater] = 'Yes'
col_27[b_greater] = 'No'

col_28[a_greater] = a_factor[a_greater]
col_28[b_greater] = b_factor[b_greater]
col_28[both_equal] = np.nan
a_factor=df[“a_factor”]。to_numpy()
b_因子=df[“b_因子”]。to_numpy()
col#u26=np.empty(a_factor.shape,dtype='U3')#U3=>大小为3的字符串
col_27=np.empty(a_factor.shape,dtype='U3')
col_28=np.空(a_因数形状)
a_大于=a_系数>b_系数
b_更大=a_系数
尝试列操作:

data = {'A_factor': [1, 2, 3, 4, 5],
        'A_value': [10, 20, 30, 40, 50],
           'B_factor': [2, 3, 1, 2, 6],
        'B_value': [11, 22, 33, 44, 55]}
df = pd.DataFrame(data)
df['col_26'] = ''
df['col_27'] = ''
df['col_28'] = np.nan

mask = df['A_factor'] > df['B_factor']
df.loc[mask, 'col_26'] = 'Yes'
df.loc[~mask, 'col_26'] = 'No'
df.loc[mask, 'col_28'] = df[mask]['A_value']

df.loc[~mask, 'col_27'] = 'Yes'
df.loc[mask, 'col_27'] = 'No'
df.loc[~mask, 'col_28'] = df[~mask]['B_value']
尝试列操作:

data = {'A_factor': [1, 2, 3, 4, 5],
        'A_value': [10, 20, 30, 40, 50],
           'B_factor': [2, 3, 1, 2, 6],
        'B_value': [11, 22, 33, 44, 55]}
df = pd.DataFrame(data)
df['col_26'] = ''
df['col_27'] = ''
df['col_28'] = np.nan

mask = df['A_factor'] > df['B_factor']
df.loc[mask, 'col_26'] = 'Yes'
df.loc[~mask, 'col_26'] = 'No'
df.loc[mask, 'col_28'] = df[mask]['A_value']

df.loc[~mask, 'col_27'] = 'Yes'
df.loc[mask, 'col_27'] = 'No'
df.loc[~mask, 'col_28'] = df[~mask]['B_value']

append
导致python对堆内存的请求获得更多内存。在
for
循环中使用
append
,会导致获取内存并不断释放内存以获取更多内存。所以最好告诉python您需要多少项

col_26 = [True]*500000
col_27 = [False]*500000
col_28 = [float('nan')]*500000

for ind in df.index:
    if df['A_factor'][ind] > df['B_factor'][ind]:
        col_28[ind] = df['A_value'][ind]
    elif df['A_factor'][ind] < df['B_factor'][ind]:
        col_26[ind] = False
        col_27[ind] = True
        col_28[ind] = df['B_value'][ind]
    else:
        col_26[ind] = ''
        col_27[ind] = ''
col_26=[True]*500000
col_27=[False]*500000
第28列=[浮动('nan')]*500000
对于df.index中的ind:
如果df['A_因子'][ind]>df['B_因子'][ind]:
col_28[ind]=df['A_值][ind]
elif df['A_因子'][ind]
append
导致python对堆内存的请求获得更多内存。在
for
循环中使用
append
,会导致获取内存并不断释放内存以获取更多内存。所以最好告诉python您需要多少项

col_26 = [True]*500000
col_27 = [False]*500000
col_28 = [float('nan')]*500000

for ind in df.index:
    if df['A_factor'][ind] > df['B_factor'][ind]:
        col_28[ind] = df['A_value'][ind]
    elif df['A_factor'][ind] < df['B_factor'][ind]:
        col_26[ind] = False
        col_27[ind] = True
        col_28[ind] = df['B_value'][ind]
    else:
        col_26[ind] = ''
        col_27[ind] = ''
col_26=[True]*500000
col_27=[False]*500000
第28列=[浮动('nan')]*500000
对于df.index中的ind:
如果df['A_因子'][ind]>df['B_因子'][ind]:
col_28[ind]=df['A_值][ind]
elif df['A_因子'][ind]
for
500000个项目的
循环在不到一秒钟的时间内运行。因此,造成问题的不是
for
循环。如果在Pandas或NumPy中完成,可能会更快…使用列操作。您能提供更多信息吗?更多代码?如果您要创建许多500000个长度的列表,那么可能会占用大量内存,这会导致速度减慢,这不是cpu问题。500000个项目的
循环在不到一秒钟的时间内运行。因此,造成问题的不是
for
循环。如果在Pandas或NumPy中完成,可能会更快…使用列操作。您能提供更多信息吗?更多代码?如果您要创建许多500000长度的列表,那么可能会使用大量内存,这会导致速度减慢,这不是cpu问题。感谢您花时间提供该示例。我还将研究矢量化,我对这一点非常陌生,但在这里学习。再次感谢Hanks花时间提供该示例。我还将研究矢量化,我对这一点非常陌生,但在这里学习。再次感谢