在python中将输入从文件提取到列表中_Python_File Io

在python中将输入从文件提取到列表中

python file-io

在python中将输入从文件提取到列表中,python,file-io,Python,File Io,除了使用for循环之外，还有其他方法从文件中获取输入吗？我在用 data = fileinput.input() c = [int(i) for i in data] c.sort() 但对于非常大的数据量，处理时间太长。输入的格式为 58457907 37850775 19743393 70718573 .... 使用readlines和map使用with打开文件在测试200行的文件时似乎效率更高 In [3]: %%timeit with open("in.txt",'rb') as

除了使用for循环之外，还有其他方法从文件中获取输入吗？我在用

data = fileinput.input()
c = [int(i) for i in data]
c.sort()

但对于非常大的数据量，处理时间太长。输入的格式为

使用

readlines

和

map

使用

with

打开文件在测试200行的文件时似乎效率更高

In [3]: %%timeit
with open("in.txt",'rb') as f:
    lines = map(int,f)
    lines.sort()
   ...: 
10000 loops, best of 3: 183 µs per loop


In [5]: %%timeit
data = fileinput.input("in.txt")
c = [int(i) for i in data]
c.sort()
   ...: 
1000 loops, best of 3: 443 µs per loop

如果我创建一个“大”文件：

from random import randint 

with open('/tmp/nums.txt', 'w') as fout:
    a,b=100002/10000, 100002*10000
    for i in range(100002):
        fout.write('{}\n'.format(randint(a,b)))

我可以读取它，将其转换为整数，并按如下方式对数据进行排序：

with open('/tmp/nums.txt') as fin:    
    nums=[int(e) for e in fin]
    nums.sort()

在我的计算机上，此操作的总时间为50毫秒。50毫秒长吗

更正式的时间安排：

def f1():
    with open('/tmp/nums.txt') as fin:    
        nums=[int(e) for e in fin]
        nums.sort()
    return nums

def f2():
    with open('/tmp/nums.txt') as fin:  
        return sorted(map(int, fin))

def f3():
    with open('/tmp/nums.txt') as fin:  
        nums=list(map(int, fin))
        nums.sort()    
    return nums    

if __name__ =='__main__':
    import timeit     
    import sys
    if sys.version_info.major==2:
        from itertools import imap as map

    result=[]    
    for f in (f1, f2, f3):
        fn=f.__name__
        fs="f()"
        ft=timeit.timeit(fs, setup="from __main__ import f", number=3)
        r=eval(fs)
        result.append((ft, fn, str(r[0:5])+'...'+str(r[-6:-1]) ))         

    result.sort(key=lambda t: t[0])    

    for i, t in enumerate(result):
        ft, fn, r = t
        if i==0:
            fr='{}: {:.4f} secs is fastest\n\tf(x)={}\n========'.format(fn, ft, r)   
        else:
            t1=result[0][0]
            dp=(ft-t1)/t1
            fr='{}: {:.4f} secs - {} is {:.2%} faster\n\tf(x)={}'.format(fn, ft, result[0][1], dp, r)

        print(fr)

您可以看到，它们之间的差异并不大（除了PyPy，其中f3显然具有优势）：

Python 2.7.8：

f3: 0.2630 secs is fastest
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
========
f2: 0.2641 secs - f3 is 0.41% faster
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
f1: 0.2779 secs - f3 is 5.67% faster
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]

Python 3.4.1：

f2: 0.1873 secs is fastest
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
========
f3: 0.1881 secs - f2 is 0.41% faster
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
f1: 0.2071 secs - f2 is 10.59% faster
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]

派比：

PY3：

f3: 0.2483 secs is fastest
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
========
f2: 0.2588 secs - f3 is 4.23% faster
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
f1: 0.2878 secs - f3 is 15.88% faster
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]

您实际上是在处理文件3次-您真的需要在最后对其进行排序吗？-试着打开文件并读取，第一行读取整个文件，然后处理每一行，然后对其进行排序——难怪需要很长时间几乎任何构造都有一个隐式循环。为什么要避免显式循环？它是一个888KB的文本文件，大约有100002行。进一步处理需要排序…文件输入会增加一些开销。如果你对时间很敏感，你可以考虑自己打开文件。@ Robᵩ, 在我的电脑上，

lines=sorted（itertools.imap（int，f））

的计时几乎是一样的，但可能是一个更好的主意，虽然

lines=sorted（int（x）表示x在f中）

非常接近。我讨厌使用

map

。刚试过排序，结果也一样。我会尝试使用itertools，

180µs

使用imapusing fileinput。fileinput代替open在我做的一些测试中增加了大约65%的时间。是的，它比我以前做的测试快得多。。谢谢但这里我使用的是一个输入文件。。这将使用参数显式提供。@AbhishekSharma：您的程序是通过stdin提供文件名还是100002数字？Fileinput支持其中一种/两种。我首先使用stdin编写代码，但出于测试目的，我使用了testcase文件。。在执行过程中花了很多时间。。。

f3: 0.2483 secs is fastest
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
========
f2: 0.2588 secs - f3 is 4.23% faster
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]
f1: 0.2878 secs - f3 is 15.88% faster
    f(x)=[3025, 18834, 19637, 29124, 42088]...[999964829, 999970030, 999984585, 1000005692, 1000010131]