Python bz2读线在字节模式下变慢
我有一个bz2压缩日志文件,有很多行。每一行都必须经过一个小的分析,这在这里并不重要 我开始以文本模式阅读这些行,如:Python bz2读线在字节模式下变慢,python,readlines,bz2,Python,Readlines,Bz2,我有一个bz2压缩日志文件,有很多行。每一行都必须经过一个小的分析,这在这里并不重要 我开始以文本模式阅读这些行,如: import bz2 path = 'content.log.bz2' def method_1(path): with bz2.open(path, 'rt') as file: lines = [line for line in file] return lines (分析发生在列表中。)prun给出: 1394087 functio
import bz2
path = 'content.log.bz2'
def method_1(path):
with bz2.open(path, 'rt') as file:
lines = [line for line in file]
return lines
(分析发生在列表中。)prun给出:
1394087 function calls (1394086 primitive calls) in 9.864 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
17350 3.295 0.000 3.295 0.000 {method 'decompress' of '_bz2.BZ2Decompressor' objects}
1 3.042 3.042 9.780 9.780 <ipython-input-1-77d0033e4930>:7(<listcomp>)
1143178 2.232 0.000 2.232 0.000 bz2.py:137(closed)
16617 0.266 0.000 3.894 0.000 _compression.py:66(readinto)
16617 0.138 0.000 3.474 0.000 _compression.py:72(read)
16617 0.138 0.000 4.386 0.000 bz2.py:184(read1)
66467 0.128 0.000 0.128 0.000 {built-in method builtins.len}
16617 0.108 0.000 4.001 0.000 {method 'read1' of '_io.BufferedReader' objects}
16617 0.086 0.000 0.153 0.000 codecs.py:319(decode)
1 0.083 0.083 9.864 9.864 <string>:1(<module>)
16617 0.076 0.000 0.247 0.000 _compression.py:16(_check_can_read)
16619 0.070 0.000 0.171 0.000 bz2.py:151(readable)
16621 0.068 0.000 0.101 0.000 _compression.py:12(_check_not_closed)
16617 0.067 0.000 0.067 0.000 {built-in method _codecs.utf_8_decode}
16617 0.058 0.000 0.058 0.000 {method 'cast' of 'memoryview' objects}
886 0.009 0.000 0.009 0.000 {method 'read' of '_io.BufferedReader' objects}
事实证明,这在性能方面比我的第一个解决方案差得多。prun现在提供:
8020433 function calls in 39.857 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1126551 10.761 0.000 36.901 0.000 bz2.py:206(readline)
1126551 4.739 0.000 11.553 0.000 bz2.py:151(readable)
1126551 4.655 0.000 16.208 0.000 _compression.py:16(_check_can_read)
1126551 4.517 0.000 6.814 0.000 _compression.py:12(_check_not_closed)
17350 4.333 0.000 4.333 0.000 {method 'decompress' of '_bz2.BZ2Decompressor' objects}
1126551 3.023 0.000 7.947 0.000 {method 'readline' of '_io.BufferedReader' objects}
1 2.880 2.880 39.780 39.780 <ipython-input-1-77d0033e4930>:12(<listcomp>)
1126554 2.297 0.000 2.297 0.000 bz2.py:137(closed)
1126552 1.985 0.000 1.985 0.000 {built-in method builtins.isinstance}
16617 0.273 0.000 4.924 0.000 _compression.py:66(readinto)
16617 0.140 0.000 4.515 0.000 _compression.py:72(read)
66467 0.128 0.000 0.128 0.000 {built-in method builtins.len}
1 0.073 0.073 39.857 39.857 <string>:1(<module>)
16617 0.040 0.000 0.040 0.000 {method 'cast' of 'memoryview' objects}
886 0.009 0.000 0.009 0.000 {method 'read' of '_io.BufferedReader' objects}
8020433函数调用只需39.857秒
订购人:内部时间
ncalls tottime percall cumtime percall文件名:lineno(函数)
1126551 10.761 0.000 36.901 0.000 bz2.py:206(读线)
1126551 4.739 0.000 11.553 0.000 bz2.py:151(可读)
1126551 4.655 0.000 16.208 0.000压缩。py:16(_check_can_read)
1126551 4.517 0.000 6.814 0.000压缩比:12(检查未关闭)
17350 4.333 0.000 4.333 0.000{方法'decompressor'对象'u bz2.bz2}
1126551 3.023 0.000 7.947 0.000{方法'readline'的'u io.BufferedReader'对象}
1 2.880 2.880 39.780 39.780 :12()
1126554 2.297 0.000 2.297 0.000 bz2.py:137(已关闭)
1126552 1.985 0.000 1.985 0.000{内置方法内置.isinstance}
16617 0.273 0.000 4.924 0.000_压缩比:66(读入)
16617 0.140 0.000 4.515 0.000_压缩比:72(读取)
66467 0.128 0.000 0.128 0.000{内置方法内置.len}
1 0.073 0.073 39.857 39.857 :1()
16617 0.040 0.000 0.040 0 0.000{“memoryview”对象的方法“cast”}
886 0.009 0.000 0.009 0.000{method'read'of'u io.BufferedReader'objects}
有人知道这里有什么问题吗?我怀疑字节模式更像“机器”,因此速度更快。但事实并非如此。我怀疑以二进制模式逐行读取不是最佳选择。我怀疑以二进制模式逐行读取不是最佳选择。
8020433 function calls in 39.857 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function)
1126551 10.761 0.000 36.901 0.000 bz2.py:206(readline)
1126551 4.739 0.000 11.553 0.000 bz2.py:151(readable)
1126551 4.655 0.000 16.208 0.000 _compression.py:16(_check_can_read)
1126551 4.517 0.000 6.814 0.000 _compression.py:12(_check_not_closed)
17350 4.333 0.000 4.333 0.000 {method 'decompress' of '_bz2.BZ2Decompressor' objects}
1126551 3.023 0.000 7.947 0.000 {method 'readline' of '_io.BufferedReader' objects}
1 2.880 2.880 39.780 39.780 <ipython-input-1-77d0033e4930>:12(<listcomp>)
1126554 2.297 0.000 2.297 0.000 bz2.py:137(closed)
1126552 1.985 0.000 1.985 0.000 {built-in method builtins.isinstance}
16617 0.273 0.000 4.924 0.000 _compression.py:66(readinto)
16617 0.140 0.000 4.515 0.000 _compression.py:72(read)
66467 0.128 0.000 0.128 0.000 {built-in method builtins.len}
1 0.073 0.073 39.857 39.857 <string>:1(<module>)
16617 0.040 0.000 0.040 0.000 {method 'cast' of 'memoryview' objects}
886 0.009 0.000 0.009 0.000 {method 'read' of '_io.BufferedReader' objects}