Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/334.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/kotlin/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python bz2读线在字节模式下变慢_Python_Readlines_Bz2 - Fatal编程技术网

Python bz2读线在字节模式下变慢

Python bz2读线在字节模式下变慢,python,readlines,bz2,Python,Readlines,Bz2,我有一个bz2压缩日志文件,有很多行。每一行都必须经过一个小的分析,这在这里并不重要 我开始以文本模式阅读这些行,如: import bz2 path = 'content.log.bz2' def method_1(path): with bz2.open(path, 'rt') as file: lines = [line for line in file] return lines (分析发生在列表中。)prun给出: 1394087 functio

我有一个bz2压缩日志文件,有很多行。每一行都必须经过一个小的分析,这在这里并不重要

我开始以文本模式阅读这些行,如:

import bz2

path = 'content.log.bz2' 

def method_1(path):
    with bz2.open(path, 'rt') as file:
        lines = [line for line in file]
    return lines
(分析发生在列表中。)prun给出:

1394087 function calls (1394086 primitive calls) in 9.864 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    17350    3.295    0.000    3.295    0.000 {method 'decompress' of '_bz2.BZ2Decompressor' objects}
        1    3.042    3.042    9.780    9.780 <ipython-input-1-77d0033e4930>:7(<listcomp>)
  1143178    2.232    0.000    2.232    0.000 bz2.py:137(closed)
    16617    0.266    0.000    3.894    0.000 _compression.py:66(readinto)
    16617    0.138    0.000    3.474    0.000 _compression.py:72(read)
    16617    0.138    0.000    4.386    0.000 bz2.py:184(read1)
    66467    0.128    0.000    0.128    0.000 {built-in method builtins.len}
    16617    0.108    0.000    4.001    0.000 {method 'read1' of '_io.BufferedReader' objects}
    16617    0.086    0.000    0.153    0.000 codecs.py:319(decode)
        1    0.083    0.083    9.864    9.864 <string>:1(<module>)
    16617    0.076    0.000    0.247    0.000 _compression.py:16(_check_can_read)
    16619    0.070    0.000    0.171    0.000 bz2.py:151(readable)
    16621    0.068    0.000    0.101    0.000 _compression.py:12(_check_not_closed)
    16617    0.067    0.000    0.067    0.000 {built-in method _codecs.utf_8_decode}
    16617    0.058    0.000    0.058    0.000 {method 'cast' of 'memoryview' objects}
      886    0.009    0.000    0.009    0.000 {method 'read' of '_io.BufferedReader' objects}
事实证明,这在性能方面比我的第一个解决方案差得多。prun现在提供:

 8020433 function calls in 39.857 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1126551   10.761    0.000   36.901    0.000 bz2.py:206(readline)
  1126551    4.739    0.000   11.553    0.000 bz2.py:151(readable)
  1126551    4.655    0.000   16.208    0.000 _compression.py:16(_check_can_read)
  1126551    4.517    0.000    6.814    0.000 _compression.py:12(_check_not_closed)
    17350    4.333    0.000    4.333    0.000 {method 'decompress' of '_bz2.BZ2Decompressor' objects}
  1126551    3.023    0.000    7.947    0.000 {method 'readline' of '_io.BufferedReader' objects}
        1    2.880    2.880   39.780   39.780 <ipython-input-1-77d0033e4930>:12(<listcomp>)
  1126554    2.297    0.000    2.297    0.000 bz2.py:137(closed)
  1126552    1.985    0.000    1.985    0.000 {built-in method builtins.isinstance}
    16617    0.273    0.000    4.924    0.000 _compression.py:66(readinto)
    16617    0.140    0.000    4.515    0.000 _compression.py:72(read)
    66467    0.128    0.000    0.128    0.000 {built-in method builtins.len}
        1    0.073    0.073   39.857   39.857 <string>:1(<module>)
    16617    0.040    0.000    0.040    0.000 {method 'cast' of 'memoryview' objects}
      886    0.009    0.000    0.009    0.000 {method 'read' of '_io.BufferedReader' objects}
8020433函数调用只需39.857秒
订购人:内部时间
ncalls tottime percall cumtime percall文件名:lineno(函数)
1126551 10.761 0.000 36.901 0.000 bz2.py:206(读线)
1126551 4.739 0.000 11.553 0.000 bz2.py:151(可读)
1126551 4.655 0.000 16.208 0.000压缩。py:16(_check_can_read)
1126551 4.517 0.000 6.814 0.000压缩比:12(检查未关闭)
17350 4.333 0.000 4.333 0.000{方法'decompressor'对象'u bz2.bz2}
1126551 3.023 0.000 7.947 0.000{方法'readline'的'u io.BufferedReader'对象}
1    2.880    2.880   39.780   39.780 :12()
1126554 2.297 0.000 2.297 0.000 bz2.py:137(已关闭)
1126552 1.985 0.000 1.985 0.000{内置方法内置.isinstance}
16617 0.273 0.000 4.924 0.000_压缩比:66(读入)
16617 0.140 0.000 4.515 0.000_压缩比:72(读取)
66467 0.128 0.000 0.128 0.000{内置方法内置.len}
1    0.073    0.073   39.857   39.857 :1()
16617 0.040 0.000 0.040 0 0.000{“memoryview”对象的方法“cast”}
886 0.009 0.000 0.009 0.000{method'read'of'u io.BufferedReader'objects}

有人知道这里有什么问题吗?我怀疑字节模式更像“机器”,因此速度更快。但事实并非如此。

我怀疑以二进制模式逐行读取不是最佳选择。我怀疑以二进制模式逐行读取不是最佳选择。
 8020433 function calls in 39.857 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  1126551   10.761    0.000   36.901    0.000 bz2.py:206(readline)
  1126551    4.739    0.000   11.553    0.000 bz2.py:151(readable)
  1126551    4.655    0.000   16.208    0.000 _compression.py:16(_check_can_read)
  1126551    4.517    0.000    6.814    0.000 _compression.py:12(_check_not_closed)
    17350    4.333    0.000    4.333    0.000 {method 'decompress' of '_bz2.BZ2Decompressor' objects}
  1126551    3.023    0.000    7.947    0.000 {method 'readline' of '_io.BufferedReader' objects}
        1    2.880    2.880   39.780   39.780 <ipython-input-1-77d0033e4930>:12(<listcomp>)
  1126554    2.297    0.000    2.297    0.000 bz2.py:137(closed)
  1126552    1.985    0.000    1.985    0.000 {built-in method builtins.isinstance}
    16617    0.273    0.000    4.924    0.000 _compression.py:66(readinto)
    16617    0.140    0.000    4.515    0.000 _compression.py:72(read)
    66467    0.128    0.000    0.128    0.000 {built-in method builtins.len}
        1    0.073    0.073   39.857   39.857 <string>:1(<module>)
    16617    0.040    0.000    0.040    0.000 {method 'cast' of 'memoryview' objects}
      886    0.009    0.000    0.009    0.000 {method 'read' of '_io.BufferedReader' objects}