Python最干净的解析方法_Python

Python最干净的解析方法

python

Python最干净的解析方法,python,Python,A有几行日志，格式是“TimeA:0.216/1，TimeB:495.761/1，TimeC:2.048/2，TimeD:0.296/1”（语法是timerName:time/instances）`这就是我解析它的方式 ServiceTimer = namedtuple("ServiceTimer", ["timerName", "time", "instances"]) timers = [] for entry in line.split(","): name, rest = ent

A有几行日志，格式是“TimeA:0.216/1，TimeB:495.761/1，TimeC:2.048/2，TimeD:0.296/1”（语法是timerName:time/instances）`这就是我解析它的方式

ServiceTimer = namedtuple("ServiceTimer", ["timerName", "time", "instances"])
timers = []
for entry in line.split(","):
    name, rest = entry.split(":")
    time, instances = rest.split("/")
    timers.append(ServiceTimer(name, float(time), int(instances)))

如果有更好的方法，那么它也需要更快，因为有数以百万计的日志行。任何指针都很好。

可能行数更少

  for entry in line.split(','):
    split_line = entry.split(":")[1].split('/')
    timers.append(ServiceTimer(entry.split(':')[0],float(split_line[0]),int(split_line[1])

也许用更少的线

  for entry in line.split(','):
    split_line = entry.split(":")[1].split('/')
    timers.append(ServiceTimer(entry.split(':')[0],float(split_line[0]),int(split_line[1])

也许用更少的线

  for entry in line.split(','):
    split_line = entry.split(":")[1].split('/')
    timers.append(ServiceTimer(entry.split(':')[0],float(split_line[0]),int(split_line[1])

也许用更少的线

  for entry in line.split(','):
    split_line = entry.split(":")[1].split('/')
    timers.append(ServiceTimer(entry.split(':')[0],float(split_line[0]),int(split_line[1])

根据@zaftcoAgeiha建议，使用正则表达式：

from re import finditer
line = "TimeA:0.216/1,TimeB:495.761/1,TimeC:2.048/2,TimeD:0.296/1"
[ m.groups( ) for m in finditer( r'([^,:]*):([^/]*)/([^,]*)', line ) ]

您将获得：

[('TimeA', '0.216', '1'),
 ('TimeB', '495.761', '1'),
 ('TimeC', '2.048', '2'),
 ('TimeD', '0.296', '1')]

对于类型转换，您可以使用

group

方法：

[ ( m.group(1), float( m.group(2) ) , int( m.group(3) ))
    for m in finditer( r'([^,:]*):([^/]*)/([^,]*)', line ) ]

编辑：要解析整个文件，您需要首先编译模式，并使用列表理解而不是

附加

：

from re import compile

regex = compile( r'([^,:]*):([^/]*)/([^,]*)' )
with open( 'fname.txt', 'r' ) as fin:
    results = [ ( m.group(1), float( m.group(2) ) , int( m.group(3) ))
        for m in regex.finditer( line ) for line in fin]

根据@zaftcoAgeiha建议，使用正则表达式：

from re import finditer
line = "TimeA:0.216/1,TimeB:495.761/1,TimeC:2.048/2,TimeD:0.296/1"
[ m.groups( ) for m in finditer( r'([^,:]*):([^/]*)/([^,]*)', line ) ]

您将获得：

[('TimeA', '0.216', '1'),
 ('TimeB', '495.761', '1'),
 ('TimeC', '2.048', '2'),
 ('TimeD', '0.296', '1')]

对于类型转换，您可以使用

group

方法：

[ ( m.group(1), float( m.group(2) ) , int( m.group(3) ))
    for m in finditer( r'([^,:]*):([^/]*)/([^,]*)', line ) ]

编辑：要解析整个文件，您需要首先编译模式，并使用列表理解而不是

附加

：

from re import compile

regex = compile( r'([^,:]*):([^/]*)/([^,]*)' )
with open( 'fname.txt', 'r' ) as fin:
    results = [ ( m.group(1), float( m.group(2) ) , int( m.group(3) ))
        for m in regex.finditer( line ) for line in fin]

根据@zaftcoAgeiha建议，使用正则表达式：

from re import finditer
line = "TimeA:0.216/1,TimeB:495.761/1,TimeC:2.048/2,TimeD:0.296/1"
[ m.groups( ) for m in finditer( r'([^,:]*):([^/]*)/([^,]*)', line ) ]

您将获得：

[('TimeA', '0.216', '1'),
 ('TimeB', '495.761', '1'),
 ('TimeC', '2.048', '2'),
 ('TimeD', '0.296', '1')]

对于类型转换，您可以使用

group

方法：

[ ( m.group(1), float( m.group(2) ) , int( m.group(3) ))
    for m in finditer( r'([^,:]*):([^/]*)/([^,]*)', line ) ]

编辑：要解析整个文件，您需要首先编译模式，并使用列表理解而不是

附加

：

from re import compile

regex = compile( r'([^,:]*):([^/]*)/([^,]*)' )
with open( 'fname.txt', 'r' ) as fin:
    results = [ ( m.group(1), float( m.group(2) ) , int( m.group(3) ))
        for m in regex.finditer( line ) for line in fin]

根据@zaftcoAgeiha建议，使用正则表达式：

from re import finditer
line = "TimeA:0.216/1,TimeB:495.761/1,TimeC:2.048/2,TimeD:0.296/1"
[ m.groups( ) for m in finditer( r'([^,:]*):([^/]*)/([^,]*)', line ) ]

您将获得：

[('TimeA', '0.216', '1'),
 ('TimeB', '495.761', '1'),
 ('TimeC', '2.048', '2'),
 ('TimeD', '0.296', '1')]

对于类型转换，您可以使用

group

方法：

[ ( m.group(1), float( m.group(2) ) , int( m.group(3) ))
    for m in finditer( r'([^,:]*):([^/]*)/([^,]*)', line ) ]

编辑：要解析整个文件，您需要首先编译模式，并使用列表理解而不是

附加

：

from re import compile

regex = compile( r'([^,:]*):([^/]*)/([^,]*)' )
with open( 'fname.txt', 'r' ) as fin:
    results = [ ( m.group(1), float( m.group(2) ) , int( m.group(3) ))
        for m in regex.finditer( line ) for line in fin]

我测试了三个版本：

没有命名元组的原始代码
带有类型转换的regexp示例
另一个带有一些速度技巧的regexp版本

结果有点让我吃惊。我的结果显示“string.split”非常快，比示例regexp处理速度更快。为了使regexp更快，您必须使用内存映射文件并忘记逐行处理

以下是temp.py中的源代码：

def process1():
    results = []
    with open('temp.txt') as fptr:
        for line in fptr:
            for entry in line.split(','):
                name, rest = entry.split(":")
                time, instances = rest.split("/")
                results.append((name, float(time), int(instances)))
    return len(results)

def process2():
    from re import finditer
    results = []
    with open('temp.txt') as fptr:
        for line in fptr:
            for match in finditer(r'([^,:]*):([^/]*)/([^,]*)', line):
                results.append(
                    (match.group(1), float(match.group(2)), int(match.group(3))))
    return len(results)

def process3():
    from re import finditer
    import mmap
    results = []
    with open('temp.txt', 'r+') as fptr:
        fmap = mmap.mmap(fptr.fileno(), 0)
        for match in finditer(r'([^,:]*):([^/]*)/([^,\r\n]*)', fmap):
            results.append(
                (match.group(1), float(match.group(2)), int(match.group(3))))
    return len(results)

我在一个“temp.txt”文本文件上测试了这些函数，其中有一百万个重复的示例行。结果如下：

In [8]: %time temp.process1()
CPU times: user 10.24 s, sys: 0.00 s, total: 10.24 s
Wall time: 10.24 s
Out[8]: 4000000

In [9]: %time temp.process2()
CPU times: user 12.63 s, sys: 0.00 s, total: 12.63 s
Wall time: 12.63 s
Out[9]: 4000000

In [10]: %time temp.process3()
CPU times: user 9.43 s, sys: 0.00 s, total: 9.43 s
Wall time: 9.43 s
Out[10]: 4000000

因此，忽略逐行处理和内存映射文件的regexp版本比示例代码快7%。示例regexp代码比您的示例慢23%

故事的寓意：永远是基准。

我测试了三个版本：

没有命名元组的原始代码
带有类型转换的regexp示例
另一个带有一些速度技巧的regexp版本

结果有点让我吃惊。我的结果显示“string.split”非常快，比示例regexp处理速度更快。为了使regexp更快，您必须使用内存映射文件并忘记逐行处理

以下是temp.py中的源代码：

def process1():
    results = []
    with open('temp.txt') as fptr:
        for line in fptr:
            for entry in line.split(','):
                name, rest = entry.split(":")
                time, instances = rest.split("/")
                results.append((name, float(time), int(instances)))
    return len(results)

def process2():
    from re import finditer
    results = []
    with open('temp.txt') as fptr:
        for line in fptr:
            for match in finditer(r'([^,:]*):([^/]*)/([^,]*)', line):
                results.append(
                    (match.group(1), float(match.group(2)), int(match.group(3))))
    return len(results)

def process3():
    from re import finditer
    import mmap
    results = []
    with open('temp.txt', 'r+') as fptr:
        fmap = mmap.mmap(fptr.fileno(), 0)
        for match in finditer(r'([^,:]*):([^/]*)/([^,\r\n]*)', fmap):
            results.append(
                (match.group(1), float(match.group(2)), int(match.group(3))))
    return len(results)

我在一个“temp.txt”文本文件上测试了这些函数，其中有一百万个重复的示例行。结果如下：

In [8]: %time temp.process1()
CPU times: user 10.24 s, sys: 0.00 s, total: 10.24 s
Wall time: 10.24 s
Out[8]: 4000000

In [9]: %time temp.process2()
CPU times: user 12.63 s, sys: 0.00 s, total: 12.63 s
Wall time: 12.63 s
Out[9]: 4000000

In [10]: %time temp.process3()
CPU times: user 9.43 s, sys: 0.00 s, total: 9.43 s
Wall time: 9.43 s
Out[10]: 4000000

因此，忽略逐行处理和内存映射文件的regexp版本比示例代码快7%。示例regexp代码比您的示例慢23%

故事的寓意：永远是基准。

我测试了三个版本：

没有命名元组的原始代码
带有类型转换的regexp示例
另一个带有一些速度技巧的regexp版本

结果有点让我吃惊。我的结果显示“string.split”非常快，比示例regexp处理速度更快。为了使regexp更快，您必须使用内存映射文件并忘记逐行处理

以下是temp.py中的源代码：

def process1():
    results = []
    with open('temp.txt') as fptr:
        for line in fptr:
            for entry in line.split(','):
                name, rest = entry.split(":")
                time, instances = rest.split("/")
                results.append((name, float(time), int(instances)))
    return len(results)

def process2():
    from re import finditer
    results = []
    with open('temp.txt') as fptr:
        for line in fptr:
            for match in finditer(r'([^,:]*):([^/]*)/([^,]*)', line):
                results.append(
                    (match.group(1), float(match.group(2)), int(match.group(3))))
    return len(results)

def process3():
    from re import finditer
    import mmap
    results = []
    with open('temp.txt', 'r+') as fptr:
        fmap = mmap.mmap(fptr.fileno(), 0)
        for match in finditer(r'([^,:]*):([^/]*)/([^,\r\n]*)', fmap):
            results.append(
                (match.group(1), float(match.group(2)), int(match.group(3))))
    return len(results)

我在一个“temp.txt”文本文件上测试了这些函数，其中有一百万个重复的示例行。结果如下：

In [8]: %time temp.process1()
CPU times: user 10.24 s, sys: 0.00 s, total: 10.24 s
Wall time: 10.24 s
Out[8]: 4000000

In [9]: %time temp.process2()
CPU times: user 12.63 s, sys: 0.00 s, total: 12.63 s
Wall time: 12.63 s
Out[9]: 4000000

In [10]: %time temp.process3()
CPU times: user 9.43 s, sys: 0.00 s, total: 9.43 s
Wall time: 9.43 s
Out[10]: 4000000

因此，忽略逐行处理和内存映射文件的regexp版本比示例代码快7%。示例regexp代码比您的示例慢23%

故事的寓意：永远是基准。

我测试了三个版本：

没有命名元组的原始代码
带有类型转换的regexp示例
另一个带有一些速度技巧的regexp版本

结果有点让我吃惊。我的结果显示“string.split”非常快，比示例regexp处理速度更快。为了使regexp更快，您必须使用内存映射文件并忘记逐行处理

以下是temp.py中的源代码：

def process1():
    results = []
    with open('temp.txt') as fptr:
        for line in fptr:
            for entry in line.split(','):
                name, rest = entry.split(":")
                time, instances = rest.split("/")
                results.append((name, float(time), int(instances)))
    return len(results)

def process2():
    from re import finditer
    results = []
    with open('temp.txt') as fptr:
        for line in fptr:
            for match in finditer(r'([^,:]*):([^/]*)/([^,]*)', line):
                results.append(
                    (match.group(1), float(match.group(2)), int(match.group(3))))
    return len(results)

def process3():
    from re import finditer
    import mmap
    results = []
    with open('temp.txt', 'r+') as fptr:
        fmap = mmap.mmap(fptr.fileno(), 0)
        for match in finditer(r'([^,:]*):([^/]*)/([^,\r\n]*)', fmap):
            results.append(
                (match.group(1), float(match.group(2)), int(match.group(3))))
    return len(results)

我在一个“temp.txt”文本文件上测试了这些函数，其中有一百万个重复的示例行。结果如下：

In [8]: %time temp.process1()
CPU times: user 10.24 s, sys: 0.00 s, total: 10.24 s
Wall time: 10.24 s
Out[8]: 4000000

In [9]: %time temp.process2()
CPU times: user 12.63 s, sys: 0.00 s, total: 12.63 s
Wall time: 12.63 s
Out[9]: 4000000

In [10]: %time temp.process3()
CPU times: user 9.43 s, sys: 0.00 s, total: 9.43 s
Wall time: 9.43 s
Out[10]: 4000000

因此，忽略逐行处理和内存映射文件的regexp版本比示例代码快7%。示例regexp代码比您的示例慢23%

故事的寓意：始终使用基准测试。

是否需要使用

namedtuple

？使用正则表达式组？如果您使用的是Perl，那么您可以处理这么多faster@allKid，不一定，我想知道使用字典是否会更快。是否需要使用

namedtuple

？使用正则表达式组？如果您使用的是Perl，那么您可以处理这么多faster@allKid，不一定，我想知道使用字典是否会更快。是否需要使用

namedtuple

？使用正则表达式组？如果您使用的是Perl，那么您可以处理这么多faster@allKid，不一定，我想知道使用字典是否会更快。是否需要使用

namedtuple

？使用正则表达式组？如果您使用的是Perl，那么您可以处理这么多faster@allKid，不一定，我想知道使用字典是否会更快。故事的另一个寓意是：你不能改进你没有衡量的东西。如果速度是一个问题，我认为列表理解比

append


[youtube]相关文章推荐



                                                        
为什么youtube播放器（可能还有其他播放器）与浮动div重叠？
youtubeflash 
与嵌入式YouTube视频共享网页赢得'；不共享网页文本
youtube 
为什么手机上的YouTube音频（不限于iPhone）被篡改了？
youtube 
Fancybox和YouTube
youtube 
Youtube 如何在three.js中使用视频元素或html内容作为面部纹理？
youtubethree.js 
Youtube上传小部件时间
youtubegoogle-apiyoutube-api 
Youtube显示相关视频，即使使用flash player时rel=0，也可以使用HTML5播放器
youtube 
包含空格的用户的YouTube GDATA播放列表提要
youtube 
如何预加载嵌入式YouTube播放列表？
youtubeyoutube-api 
Youtube API V3和Etag
youtubeyoutube-api 
Youtube 某些视频未显示在api结果中
youtubeyoutube-api 
订阅YouTube频道按钮
youtubeyoutube-api 
YouTube视频中的音频比特率？
youtube 
网站上嵌入的Youtube视频在播放时显示黑条
youtube 
YouTube API v3搜索未返回所有视频
youtubeyoutube-api 
                                       





随机文章推荐



                                                        
Routing 未应用段分隔符选项
routingsymfony1 
Routing 如何在Symfony2中禁用某些路径/URL的路由
routingsymfony 
Routing 使用Optaplanner的车辆路线
routing 
Routing PHP zf2 can'；使用“时，无法通过路由进行匹配…”；。。。index.php“；但与“可”配合使用；。。。index.php/"；以斜杠结尾
routingzend-framework2 
Routing 角度2错误：无法解析'；路线图'；
routingangular 
Routing 如何将流量从VM上的IP路由到Kubernete'；什么是NodePort？
routingkubernetesgoogle-cloud-platformgoogle-compute-engine