Python 将字符串转换为int太慢

Python 将字符串转换为int太慢,python,performance,python-3.x,Python,Performance,Python 3.x,我有一个程序,每行读3个字符串,每次50000。然后它会做其他事情。读取文件并转换为整数的部分占用了总运行时间的80% 我的代码片段如下: import time file = open ('E:/temp/edges_big.txt').readlines() start_time = time.time() for line in file[1:]: label1, label2, edge = line.strip().split() # label1 = int(labe

我有一个程序,每行读3个字符串,每次50000。然后它会做其他事情。读取文件并转换为整数的部分占用了总运行时间的80%

我的代码片段如下:

import time
file = open ('E:/temp/edges_big.txt').readlines()
start_time = time.time()
for line in file[1:]:
    label1, label2, edge = line.strip().split()
    # label1 = int(label1); label2 = int(label2); edge = float(edge)
    # Rest of the loop deleted
print ('processing file took ', time.time() - start_time, "seconds")
上述操作大约需要0.84秒。现在,当我取消注释该行时

label1 = int(label1);label2 = int(label2);edge = float(edge)
运行时间上升到大约3.42秒

输入文件的格式为:
str1 str2 str3
每行


函数
int()
float()
有那么慢吗?我怎样才能优化它呢?

我根本无法复制它

我生成了一个50000行的文件,每行包含三个随机数(两个整数,一个浮点数),用空格分隔


然后我在那个文件上运行了你的脚本。在我三岁的电脑上,原始脚本只需0.05秒即可完成,带有未注释行的脚本只需0.15秒即可完成。当然,从字符串到int/float的转换需要更长的时间,但肯定不是几秒钟的转换。除非您的目标计算机是运行嵌入式Windows CE的烤面包机。

如果文件位于操作系统缓存中,则在我的计算机上解析该文件需要毫秒:

name                                 time ratio comment
read_read                        145 usec  1.00 big.txt
read_readtxt                    2.07 msec 14.29 big.txt
read_readlines                  4.94 msec 34.11 big.txt
read_james_otigo                29.3 msec 201.88 big.txt
read_james_otigo_with_int_float 82.9 msec 571.70 big.txt
read_map_local                  93.1 msec 642.23 big.txt
read_map                        95.6 msec 659.57 big.txt
read_numpy_loadtxt               321 msec 2213.66 big.txt
其中,
read_*()
函数为:

def read_read(filename):
    with open(filename, 'rb') as file:
        data = file.read()

def read_readtxt(filename):
    with open(filename, 'rU') as file:
        text = file.read()

def read_readlines(filename):
    with open(filename, 'rU') as file:
        lines = file.readlines()

def read_james_otigo(filename):
    file = open (filename).readlines()
    for line in file[1:]:
        label1, label2, edge = line.strip().split()

def read_james_otigo_with_int_float(filename):
    file = open (filename).readlines()
    for line in file[1:]:
        label1, label2, edge = line.strip().split()
        label1 = int(label1); label2 = int(label2); edge = float(edge)

def read_map(filename):
    with open(filename) as file:
        L = [(int(l1), int(l2), float(edge))
             for line in file
             for l1, l2, edge in [line.split()] if line.strip()]

def read_map_local(filename, _i=int, _f=float):
    with open(filename) as file:
        L = [(_i(l1), _i(l2), _f(edge))
             for line in file
             for l1, l2, edge in [line.split()] if line.strip()]

import numpy as np

def read_numpy_loadtxt(filename):
    a = np.loadtxt('big.txt', dtype=[('label1', 'i'),
                                     ('label2', 'i'),
                                     ('edge', 'f')])
使用以下方法生成
big.txt

#!/usr/bin/env python
import numpy as np

n = 50000
a = np.random.random_integers(low=0, high=1<<10, size=2*n).reshape(-1, 2)
np.savetxt('big.txt', np.c_[a, np.random.rand(n)], fmt='%i %i %s')
要复制结果并运行,请执行以下操作:

#编写big.txt
python generate-file.py
#运行基准测试
python read-array.py

我可以得到和你几乎相同的计时。我认为问题在于我的代码在计时:

read_james_otigo                  40 msec big.txt
read_james_otigo_with_int_float  116 msec big.txt
read_map                         134 msec big.txt
read_map_local                   131 msec big.txt
read_numpy_loadtxt               400 msec big.txt
read_read                        488 usec big.txt
read_readlines                  9.24 msec big.txt
read_readtxt                    4.36 msec big.txt

name                                 time ratio comment
read_read                        488 usec  1.00 big.txt
read_readtxt                    4.36 msec  8.95 big.txt
read_readlines                  9.24 msec 18.95 big.txt
read_james_otigo                  40 msec 82.13 big.txt
read_james_otigo_with_int_float  116 msec 238.64 big.txt
read_map_local                   131 msec 268.05 big.txt
read_map                         134 msec 274.87 big.txt
read_numpy_loadtxt               400 msec 819.42 big.txt


read_james_otigo                39.4 msec big.txt
read_readtxt                    4.37 msec big.txt
read_readlines                  9.21 msec big.txt
read_map_local                   131 msec big.txt
read_james_otigo_with_int_float  116 msec big.txt
read_map                         134 msec big.txt
read_read                        487 usec big.txt
read_numpy_loadtxt               398 msec big.txt

name                                 time ratio comment
read_read                        487 usec  1.00 big.txt
read_readtxt                    4.37 msec  8.96 big.txt
read_readlines                  9.21 msec 18.90 big.txt
read_james_otigo                39.4 msec 80.81 big.txt
read_james_otigo_with_int_float  116 msec 238.51 big.txt
read_map_local                   131 msec 268.84 big.txt
read_map                         134 msec 275.11 big.txt
read_numpy_loadtxt               398 msec 816.71 big.txt

我看不出这两行之间的差异会在运行时造成如此大的差异;你能澄清一下吗?这很奇怪。在我的机器上,两个
int()
调用和一个
float()
调用总共需要大约1.7us。这乘以50000等于85毫秒。这使你的比我的慢30倍。听起来不对。为了回应蒂姆所说的,你能清楚地说明你在比较哪两个版本吗?现在代码中有转换,但是注释掉了
append()
。然后,建议在中添加转换时,计时会更改。要么我完全误解了这一点,要么明显有一些拼写错误。如果我是你,我会看看三次转换中的每一次都花了多少时间。另外,我还提出了一个小型的自包含的可运行测试用例,它演示了缓慢性,我们可以用它进行实验。什么是Python3.x版本?下面是我从快速试用中看到的:2.7、3.2和3.3在没有转换的情况下都运行0.033。通过转换,我得到:2.7-0.125s;3.1-0.162s;3.2-0.155s,3.3-0.10s。对于3.1和3.2,这是5的减速,0.84 x 5~4s。蒂姆,这实际上是一台新机器,核心是i7,64位。我将尝试使用不同的文件和更新我很惊讶numpy loadtxt版本比其他版本慢得多——知道那里发生了什么吗?@EdwardLoper:不知道。你在你的机器上得到了什么结果?我得到了与你相同的结果——loadtxt要慢得多。这让我很惊讶,因为我认为loadtxt的全部意义在于它是用c编写的,所以速度很快。尽管经过进一步的调查,我可能得到了错误的信息——毕竟它可能是用python编写的,在这种情况下,我肯定会认为它会更慢。下面是它的定义:——所以现在我觉得它很慢是有道理的。我想这更多是为了方便,而不是为了快速加载数据。Edward,我在下面得到了关于编码的错误:
SyntaxError:E:\reporttime.py文件第37行的非ASCII字符'\xb5',但没有声明编码
read_james_otigo                  40 msec big.txt
read_james_otigo_with_int_float  116 msec big.txt
read_map                         134 msec big.txt
read_map_local                   131 msec big.txt
read_numpy_loadtxt               400 msec big.txt
read_read                        488 usec big.txt
read_readlines                  9.24 msec big.txt
read_readtxt                    4.36 msec big.txt

name                                 time ratio comment
read_read                        488 usec  1.00 big.txt
read_readtxt                    4.36 msec  8.95 big.txt
read_readlines                  9.24 msec 18.95 big.txt
read_james_otigo                  40 msec 82.13 big.txt
read_james_otigo_with_int_float  116 msec 238.64 big.txt
read_map_local                   131 msec 268.05 big.txt
read_map                         134 msec 274.87 big.txt
read_numpy_loadtxt               400 msec 819.42 big.txt


read_james_otigo                39.4 msec big.txt
read_readtxt                    4.37 msec big.txt
read_readlines                  9.21 msec big.txt
read_map_local                   131 msec big.txt
read_james_otigo_with_int_float  116 msec big.txt
read_map                         134 msec big.txt
read_read                        487 usec big.txt
read_numpy_loadtxt               398 msec big.txt

name                                 time ratio comment
read_read                        487 usec  1.00 big.txt
read_readtxt                    4.37 msec  8.96 big.txt
read_readlines                  9.21 msec 18.90 big.txt
read_james_otigo                39.4 msec 80.81 big.txt
read_james_otigo_with_int_float  116 msec 238.51 big.txt
read_map_local                   131 msec 268.84 big.txt
read_map                         134 msec 275.11 big.txt
read_numpy_loadtxt               398 msec 816.71 big.txt